EDITORIAL STAFF 


EpbItTor 
WILLIAM KRUSKAL 


AssociaTE EpiTors 


ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 


Z. W. BIRNBAUM WASSILY HOEFFDING E. L. LEHMANN 
N. L. JOHNSON 


WITH THE COOPERATION OF 


J. L. Doos Evcene LuKkacs 
Meyer Dwass F. J. Masspy 

D. A. 8. Fraser G. E, NorrHer 
SamMvuEL KaRLIN Howarp Ratrra 


Harry Kesten dg 
C. H. Krarr . 


D. R. Truax 
Sotomon KvuULLBACH LioneL WEIss 


Past Epitors oF THE ANNALS 


H. C. Carver, 1930-1938 T. W. AnpERsoN, 1950-1952 
8.8. 


WILks, 1938-1949 E. L. LEHMANN, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December at Baltimore, Maryland. 


IMS INSTITUTIONAL MEMBERS 


AMERICAN ViscosE CorPoRATION, Marcus Hook, Pennsylvania. 

Bre.u TELEPHONE LABORATORIES, INc., TECHNICAL LIBRARY, 463 West Street, New York 14, 
New York. 

ComMMITTEE ON Statistics, INDIANA University, Bloomington, Indiana 

INTERNATIONAL Business MacuHInEs CorporaTion, New York 

Iowa State Co.Luece, Statistica, LaBoratory, Ames, Iowa 

LockHEED ArrcraFrT CorPorRATION, Burbank, California 

MASSACHUSETTS INSTITUTE OF TECHNOLOGY, HayDEN LIBRARY, PERIODICAL DEPARTMENT, 
Cambridge 39, Massachusetts 

MIcHIGaN StaTE UNIVERSITY, DEPARTMENT OF Statistics, East Lansing, Michigan 

Nationa Security Acency, Fort George G. Meade, Maryland 

PRINCETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Sratistics, Princeton, New Jersey 

Purpve University, Lafayette, Indiana 

State University or Iowa, Iowa City, Iowa 

Tue Catuouic UNIVERSITY OF AMERICA, STATISTICAL LABORATORY, DEPARTMENT OF MATH- 
EMATICS, Washington, D. C. 

Tue Ramo-Woo.pRIDGE CorporaTion, Los Angeles, California 

UNIVERSITY OF CALIFORNIA, STATISTICAL LABORATORY, Berkeley, California 

University oF Iuurnors, Urbana, Illinois 

Untversity oF Norts CArRoLina, DEPARTMENT oF Statistics, Chapel Hill, North Carolina 

UNIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL, RESEARCH, Seattle, Washington 





INVARIANCE THEORY AND A MODIFIED MINIMAX PRINCIPLE 


By Oscar WESsLER! 
Stanford University and University of Michigan 


TABLE OF CONTENTS 


. Introduction and summary 

. The use of invariance as a slicing principle 

. The principle of invariance in testing hypotheses 

. The modified minimax principle, completeness, and the gen- 
eralized Hunt-Stein theorem 

. Mixed games, and the use of previous experience............. 
Acknowledgment 
References 


1. Introduction and summary. One of the unpleasant facts about statistical 
decision problems is that they are generally much too big or too difficult to admit 
of practical solutions, a fact that is threatening to widen even further the gap 
between the theory and application of this brave new discipline. Briefly, the 
situation is this. For each possible decision procedure ¢, the statistician is con- 
cerned only with the values p(w, ¢) of the risk function as w ranges over the set 
© of all possible states of nature, so that a choice of a decision procedure amounts 
to a choice of a risk function. The obvious difficulty of comparing functions in 
the search for a best procedure now arises, constituting a major problem for the 
statistician. The Bayes and minimax principles, it should be noted, represent 
but two extremes to which the statistician can go to get around this difficulty 
rather than to meet it, the one assuming complete knowledge of an a priori 
probability distribution \ of the possible states w, the other assuming least favor- 
able circumstances about w, so that in either case one considers only a single 
number per procedure rather than the entire function—-the Bayesian the average 
risk with respect to \, the minimax-man in all timidity the supremum of the 
risk function—comparisons thus becoming trivial in principle and obliging one 
to look simply for that procedure which minimizes these numbers. Inasmuch as 
the situations occurring in practice with regard to prior knowledge about w 
usually lie between the two extremes just described, to this extent at least both 
principles are open to criticism. In view of the nature of the difficulty in choosing 
among procedures, the notion of admissible or complete classes of strategies is 
generally felt to provide the most satisfactory solution. Whereas it may be 
difficult to say what to do in a statistical decision problem, it is generally easier 


Received March 7, 1958. 


1 This paper is the author’s doctoral dissertation, accepted by Stanford University and 
published as Technical Report No. 35, Statistics Laboratory, Stanford University, October 
6, 1955; research carried out under the sponsorship of the Office of Naval Research. 


1 





“ OSCAR WESLER 


to say what not to do, so that the statistician separates out from consideration 
all inadmissible strategies and presents the practical man with what is at best 
a minimal essentially complete class of procedures. The choice of one from among 
these admissible procedures is then left to the best judgment, intuition, and past 
experience of the practical! man. If the class is a small one, we have then achieved 
everything one can ask for, for the actual choice will then easily be made. The 
difficulty, however, is that the classes are usually much too big to be of real help. 

The trouble, in a word, lies in the fact that it is the space Q itself of possible 
states that is generally too big: one simply cannot look at and assess ali the values 
of w for each decision function ¢. Now when problems turn out to be too big for 
practical purposes, it is natural to look for ways of cutting them down to size 
by methods of simplification or approximation in which very little of the original 
problem is lost. It is precisely such a cutting down or slicing up of the problem 
that we propose to treat in this paper, in the hope that it may help bring the 
theory and practice of statistical decision functions scmewhst closer together. 

The Modified Minimax Principle. The minimax principle, which looks only 
at the single value sup.p(, ¢), suffers from the defect of being an ever-simplifica- 
tion. Yet it suggests, by means of a simple modific:tion a natural way of ap- 
proximuting to the problem. This is to cut Q up, to partition it into sets ov “slices” 
©, , s running over an index set S, and then look at supse,p(w, y) = a(s, ¢) for 
each s in S, so that corresponding to the partition @ = U,.s®, ve !ook only at the 
values a(s, y) for s e S instead of at p(w, ¢) for all w. The range of p, is thus re- 
placed by the smaller range of a, , making comparisons of procedures that much 
easier. It is this slicing up of 2 (or its replacement by the smaller class S) and 
consequent simplification of the risk functions in the above way that we cail 
the modified minimax principle. (By way of analogy, one might think of upper 
Darboux sums approximating Riemann integrals.) The reduced game can then 
be treated as any other game: one can play Bayes or minimax in it, or attempt to 
delineate its admissible strategies. 

If the “slicing principle’”’ used is such that the supremum wi p, over each slice 
is not much different from the values within the slice, or has some other reason- 
able property, then very little is lost. The question of what are reasonahle or 
natural slicing principles is clearly of primary importance here, and we shall 
present what we believe are several of them. 

The theory of invariance provides us with the most powerful of these slicing 
principles and will play a central role in our considerations. The natural slicing 
of Q into its orbits under the group leads us to what appears to be the best pos- 
sible generalization of the Hunt-Stein theorem, and to its most natural setting: 
namely to the theorem that under certain regularity conditions which are often 
met in practice the invariant procedures form a complete class in the sense of the 
sliced up risk functions. The theory of composite hypotheses is also discussed 
in this light, and an example of theoretical interest is given illustrating these 
concepts, in which a difficult problem undergoes a striking simplification. 

Finally, the use of previous experience as a slicing principle is discussed, and 





INVARIANCE THEORY 3 


related to a purely game-theoretic model which we have constructed for the 
modified minimax principle and which we have called a mixed game. 

Though these slicing principles appear to be among the most important, the 
search for others continues. A method of approximation and simplification having 
been established, it remains to be seen whether these principles and available 
numerical methods can be combined to make an effective instrument in practice. 


2. The use of invariance as a slicing principle. We begin with a brief descrip- 
tion of the statistical and group-theoretic setup. Background material and 
further remarks on statistical games can be found in Blackwell and Girshick 
[1] and Wald [2]. For measure-theoretic and related notions, we refer to Halmos 
[3]. The formidability of notation, while unavoidable in this subject, is more 
apparent than real, and we shall try to clarify matters with a running com- 
mentary. 

DEFINITION 1. By a statistical problem (Z, ®, 2, P, A, @, L) is meant 

i) A set Z of all possible experimental outcomes, a Borel field @ of subsets of 
Z, 2 set Q of states of nature, and a function P defined on ® X Q such that for 
each w ¢ 2, P, isa probability measure on (Z, ®). (Z, ®, 2, P) is called the sample 
space 

ii) A set of actions A available to the statistician, together with a Borel field 
@ of subsets of A 

iii) A loss function L defined on 2 X A such that L(w, a) is the loss to the 
statistician when he takes action a and w is the true state of nature. For each 
weQ, L,, is assumed to be a non-negative @-measurable function. (A mapping 
f:M — N of one measurable space (M, 91) into another (N, 1) is said to be 
M-I measurable if the inverse image of every I-set is an M-set. If (N, WM) is 
Euclidean, with the usual Borel sets, we call the mapping simply 91-measurable. 
If E is a subset of M, we shail write f” F for the image of EF under f.) 

DEFINITION 2. (SG, yz, Yo, Ya) is said to be an admissible group on the statisti- 
cal problem, or the problero is said to be invariant under the group G (for short) if 

i) G is a group 

ii) yz 1s a homomorphism of G into the class of all ®-@ measurable 1-1 trans- 
formations of Z onto itself, yo is a homomorphism of G into the class of all 1-1 
transformations of 2 onto itself, y,4 is a homomorphism of G into the class of all 
@-@ measurable 1-1 transformations of A onto itself, where for each g eG we 
shall write gz for yz(g), go for ya(g), and gs for y(g) 

iii) for each g € S, Be B, w €Q, we have P(gz” B | go(w)) = P(B| w) and 

iv) for each g € SG, w €Q, ae A, we have L(go(w), ga(a)) = Lie, a). 

What this definition says is that each element g of group G is in effect three 
simultaneous permutations or relabelings, gz, ga, and g,, of the elements of 
Z, 2, and A, respectively, together with their respective Borel fields, under which 
probabilities of sets and losses due to actions are invariant. The homomorphisms 
further imply that we have three groups Gz , Ge, and G. of such permutations. 

As a simple example illustrating these concepts, we may think of the following: 
Let Z be the real line, ® the ordinary Borel sets, 2 the real line, P.. the normal 





4 OSCAR WESLER 


distribution with mean w and variance 1,A = Z, @ = @, and L(w, a) = ¥(w — a), 
i.e., the loss is some function depending only on the difference w — a. The problem 
is easily seen to be invariant under the full translation group G each element of 
which gives rise to three identical translations of the three real lines Z, 2, A by 
the same real number. The reader may generalize this example at once to the 
n-dimensional case, with P,, the multivariate normal distribution N(w, 7), vector 
w = (w, +--+ , w,) of means, covariance matrix J, and translations z — z + g = 
(2, +9,°°* 2a t+ g). 

DEFINITION 3. A randomized decision procedure is a function g on @ X Z 
to the unit interval of reals (0, 1] such that for each z, ¢, is a probability measure 
on (A, @), ie., ¢(T | z) is the probability of taking an action in T ¢ @, given z, 
and such that for fixed 7', gr is a ®-measurable function of z. We write ® for the 
class of all randomized decision functions. 

DEFINITION 4. For any state of nature w and decision procedure ¢, the value 
of the risk function p is given by 


p(w, v) = I L(w, a) de(a| z) dP.(z). 


Dertnition 5. For each g ¢ S, we make correspond to each procedure ¢ a new 
procedure geg given by 


(gae)(T | z) = g(gaT | gz(z)). 


It is an easy but very important consequence of our definitions that the risk 
function p is invariant, or as we say, transforms “‘correctly,”’ under the group by 
means of the formula 


p(w, gay) _ p(ga(w), ¢) for every g € G. 
The simple verification of this is as follows: 
p(w, gee) = f [ Ll, a) delga(a) | ge(2)) aP(2| «). 


Writing L(w, a) = L(go(w), ga(a)) and dP(z|w) = dP(gz(z) | ga(w)), then re- 
labeling gzz = 2’ and g(a) = a’, we get 


[ [ 2Ga(e), a’) de(a’ | 2") aPC | ga(.)) 


p(ga(w), ¢). 


DEFINITION 6. A decision procedure ¢ is said to be invariant under the group 
if goo = ¢ for every g £G, i.e., if 


g(gaT | ge(z)) = o(T | 2) for all g, z, 7’. 


Note, then, that if ¢ is invariant, p(w, ¢) = p(go(w), ¢) for every g ¢S. 
A brief word at this point on the effect in general of an arbitrary group of per- 
mutations on an arbitrary set will give some direction to these considerations 





INVARIANCE THEORY 5 


and will enable us to convey the essence and spirit of what is known as the in- 
variance principle. 

Let G be a group of permutations of the elements of a set X. The existence of 
a permutation sending one element of X into another is readily seen to establish 
an equivalence relation in X and gives rise to a useful partition of X into equiva- 
lence classes, known as “orbits,” under the group. Formally, given xz ¢ X, the 
orbit V, to which it belongs is the set of elements y of X given by 


V. = {y:y = gz for at least one g ¢ G}. 


Now so far as the effect of G in its bearing on X is concerned, there is no dis- 
tinguishing between elements of the same orbit. Taking an anthropomorphic 
view of the group, from the point of view of “‘the man in the group,”’ all elements 
in an orbit look alike, the group in its dealings with the set displaying blindness 
to a mere matter of a difference in labels, To put it another way, when the man 
in the group looks at X, he sees only orbits. 

The invariance principle. Returning to our statistical setup, the space Z of 
experimental outcomes breaks up into orbits under the influence of SG, more 
specifically under the permutation group Gz . 2 also breaks up into orbits under 
Ga . By the invariance principle is meant the adoption by the statistician of the 
viewpoint of the man in the group GS: he becomes himself the man in the group 
in that he too sees the problem in terms of orbits only. Specifically, this means 
that the only decision procedures he will employ are the invariant ones, that is 
to say, those procedures which, while free to vary at will from orbit to orbit of 
Z, must exhibit within each orbit complete consistency with respect to the groups 
Gz and G, as prescribed in Definition 6. 

The main reason for restricting ourselves to invariant procedures has always 
been their undeniable plausibility. We are further encouraged by the fact that 
in many problems (under a finite group, say, and in other cases) an invariant 
decision procedure admissible within the class of all invariant procedures is 
known to be admissible among all procedures as well. Moreover, if we should be 
looking for minimax strategies, the Hunt-Stein theorem (to be described in the 
next section in terms of testing hypotheses) tells us that under certain weak as- 
sumptions a minimax invariant procedure is minimax among all procedures too. 
For us, however, there is now another and more compelling reason. It turns out 
that the use of invariance as a slicing principle, so that orbits rather than points 
become the basic elements under consideration, leads in our modified minimax 
sense to the best possible generalization of the Hunt-Stein result, namely, that 
the invariant procedures form a complete class in the sense of the sliced up risk 
functions. In the modified minimax sense, then, the use of invariance provides 
a striking simplification of the original problem, particularly when the orbits, 
hence the groups, are rather large. 

It should be noted in this connection, by the remark in Definition 6 above, 
that risk functions p, for invariant procedures ¢ are constant over each orbit of 
Q, as is to be expected, so that invariant procedures lose nothing under the modi- 





6 OSCAR WESLER 


fied minimax principle of taking suprema. Before presenting our main result in 
some detail, it will be convenient and appropriate to give a brief description of 
the invariance principle in the language of testing hypotheses, and to state the 
classical Hunt-Stein theorem in these terms. 


3. The principle of invariance in testing hypotheses. Let (Z, @, 2, P) with 
& = % u Q, a disjoint union, be a sample space, and suppose we want to test 
Ho:w € M% against Hi:w eM. 

Let there exist a o-finite measure v on (Z, ®) such that there is a real-valued 
function p on Z X Q with 


P(B\|w) = / p(z| w) dv (2) for every we Q and Be B, 
B 


i.e., We are assuming the existence of a o-finite measure vy which dominates all 
our probability distributions P,, and p, is the Radon-Nikodym derivative 
dP./dv. 

Derinition 1. The group G is said to keep the testing problem invariant if 
w € Q; implies go(w) e 2; , 7 = 0, 1, for every g e G, that is to say, if each permuta- 
tion ge of the elements of 2 permutes the elements of Q and of Q; separately. 
(Note that this implies for the testing problem that we are specializing the general 
statistical problem down to a two-action setup A = {0, 1} with constant losses 
due to a wrong decision and g, = J, the identity mapping, for every g.) 

DEFINITION 2. By a randomized test is meant a @-measurable real-valued func- 
tion g on Z with values in the closed interval [0, 1], where ¢(z) is the probability 
of rejecting Hy when z is the observed sample point. 

Derinition 3. The test ¢ is said to be invariant under G if for all g and_.z 
¢(gz(z)) = ¢(z) except at most on a set of v-measure 0, i.e., if ¢ is essentially 
constant on the orbits of Z. If the exceptional set of »-measure 0 depends on g, 
then ¢ is called almost invariant. 

Let € be a Borel field of subsets of S. Two assumptions will always be made: 

i) (g, 2) — gz(z) is ® — © X B measurable 

ii) (91 , G2) — gige is © — @* measurable. 

DEFINITION 4. A measure » on (G, @) is said to be right invariant if u(Cg) = 
u(C) for everyg ¢ G andevery C « @. Left invariance is defined using u(gC) = u(C). 

As an example of an invariant measure we may take for G the additive group 
of reals, @ the ordinary Borel sets, and » Lebesgue measure on the reals. No in- 
variant probability measure, however, exists for this group, which gives rise to 
the following useful limiting notion. 

Dertnition 5. A sequence {yu,} of probability measures on (G, @) is said to 
be asymptotically right invariant if lim,.«(u,(Cg) — ua(C)) = 0 for every 
geGandC ee. 

As an example we take the same group just given, and for each C ¢ @ and 
n = 1,2, --- , define the probability measure yu, on (SG, ©) by 


un(C) = (1/2n)u(C n[—n, n)), 





INVARIANCE THEORY 7 


l.e., We approximate Lebesgue measure » by the conditional probabilities yu, 
given that g belongs to the closed interval [—n, n]. Clearly the yu, are all prob- 
ability measures and 1/2n is the normalizing factor. To prove asymptotic in- 
variance, we have, since clearly Cg n [—n, n] = Cn[—n — g,n — gl, 


| ua(Co) — wa(C)| = | =~ u(C a [=n — g,m — gl) — = (Cn (=n, ml) 


'g| 


<s—-—0 asn— ©, 
2n 


showing in fact that the approach to zero for each g is here even uniform in C, 
that is, 


lim sup | ua(Cg) — un(C)| = 0 for every g ¢ &. 
nwo C 


(This example generalizes at once to the additive group G (real p-tuples) of a 
p-dimensional real linear space. One takes conditional probabilities of Lebesgue 
measure given the cubes [(—n, --- , —n); (nm, --- , n)], normalizing by 1/(2n)’, 
and, writing g = (g:,--+- , gp), showing that for n sufficiently large, in fact for 
2n > maxjei,...,p | gi |, 


|na(Co) — wa(0)| $1 - [1 (1 - !2!) +0 asn—> @, 


The Hunt-Stein theorem [4], in the language of testing problems, then reads: 

THEOREM. Let there exist, for the testing problem as just defined, an asymptotically 
(right) invariant sequence of probability measures {un} on (G, ©). Then there exists 
an invariant test go among all tests ¢ for which fe(z)p(z | w) dv(z) S a forall we %, 
and which maximizes infso,foe(z)p(z | w) dv(z); i.e., there exists an invariant 
minimax test go . (The probability with which a test ¢ rejects Hy) when w is the 
true state of nature is given by 8,(w) = f¢(z)p(z | w) dv(z). B, is called the power 
function of the test and is the only property of the test that interests us, bearing 
an obvious relation to the risk function. The original version of the Hunt-Stein 
theorem was in terms of most stringent tests. The connection with this version 
is easily seen if we subtract one minus the size a envelope power function 82 (w) 
from the loss function for w ¢2,. The maximum risk for w ¢ Q; is then the 
stringency.) 

It will be very helpful for our later understanding to present a brief sketch of 
the proof for this simple case, as it contains the essential ideas without the tech- 
nical difficulties of the generalization. It depends rather heavily on the fact that 
the space of all randomized tests ¢ is compact in the weak -* topology, or more 
simply upon one version of this which is the following well-known lemma due 
to F. Riesz (see Banach [5] page 130). 

Lemma OF F. Riesz. If {¢,} ts a sequence of ®-measurable functions on Z to the 
closed interval [0, 1|—a sequence of randomized tests in our terminology—then there 





8 OSCAR WESLER 


exists a subsequence {y,;} and a test y’ such that for every function f integrable with 
respect to v 


lim [ encC2)40@) dot) = | o' este) ave). 


Proor oF THE Hunt-STEIN THEOREM: The first step of the proof consists in 
showing, by a simple application of Riesz’s lemma, that a minimax solution 
¢o exists. (It is in fact the y’ of the lemma associated with a sequence of tests ap- 
proaching the minimax condition.) For every g ¢ G, then, we have that the test 
geyo given by (ge¢o)(z) = ¢go(gz(z)) is also minimax, since the invariance of the 
risk function simply means here that the values of the power function of gago 
are just those of the power function of g permuted separately over 2 and over 
Q, . The minimax tests gego are then averaged out by the probability measures 
un On (G, ©), giving rise to the sequence of tests 


se [ go(ge(2)) dun(g). 


These tests y, are again minimax, as a simple examination of their power func- 
tions shows (interchanging orders of integration by Fubini’s theorem, we get, 
for each w, permuted values of power functions averaged out over S by un). Now 
take {¢,,} and go in the sense of Riesz’s lemma. Then just as for go above, ¢o is 
also a minimax test. The one remaining difficulty is to show that ¢go is an in- 
variant test, more precisely an almost invariant test, and this is accomplished 
in two steps by a straightforward integration argument: 

1) To show, for each g ¢ G, that vo(gz(2z)) = ¢o(z) for almost all z{v], it suffices 
to show that fyo(gz(z))f(z) dv(z) = Syo(z)f(z) dv(z) for every integrable func- 
tion f, 

2) and this readily reduces to showing that a consequence of asymptotic 
invariance of a sequence {yu} of probability measures is that 


tim| fv) dso’) — [v0 dun(a’) | = 0 


for every bounded measurable function y of g’. 

Having done this, ¢o is almost invariant. Finally, a standard argument allows 
us to replace ¢o by an invariant test go such that go = ¢o almost everywhere 
with respect to v, provided only that another regularity condition is imposed on 
the group SG. 


4. The modified minimax principle, completeness, and the generalized Hunt- 
Stein theorem. We return in this section to the general setup of a statistical 
decision problem invariant under a group of transformations, as described in 
Section 2, and assume in addition that all our probability distributions P., are 
dominated by o-finite measure v on (Z, ®). Without loss of generality, see Halmos 
and Savage [6], » may be assumed equivalent to the family of all P.,, , that is, 
if P.(B) = 0 for all w, then »(B) = 0. 





INVARIANCE THEORY 9 


From Definitions 3 and 4 of Section 2, it is clear that a decision procedure ¢ can 
be changed on a set of »-measure 0 without changing its risk function, so that 
two procedures differing on such a set may henceforth always be taken as equiva- 
lent. If in Definition 6 we have ggg = ¢ for each g ¢ S, where the exceptional set 
of v-measure 0 depends on g, we shall call ¢ almost invariant. We denote the 
class of almost invariant procedures by #* and the class of invariant procedures 
by ®**. Further, we write 2 = U,,-s, where the 2, , s running over an index set 
S, are the orbits of 2 under the group Ge . 

In strict accordance with our modified minimax principle as described in 
Sections 1 and 2, we have the following definition: 

DEFINITION 1. A procedure ¢ is said to be at least as good in the modified 
minimax sense as a procedure y if 


sup p(w, ¢) < sup pla, y) for every se S. 
weQ, wel, 


The related notions of admussibility, complete class, etc., in the modified minimax 
sense are all similarly defined, in the obvious way. Henceforth in this section, such 
phrases as ‘“‘at least as good as” are always to be understood in the modified 
minimax sense. 

In generalizing upon the Hunt-Stein theorem, we run into the complicated 
notions of weak -* compactness, convergence and cluster point for a set of 
measures on a topological space. As these concepts are by no means obvious, 
it will be necessary to spell them out in some detail. Readers who wish to avoid 
topological difficulties may pass lightly over this part of the proof, pausing only 
long enough to note that in substance it is designed to show how a cluster point 
of a sequence of procedures may be used in constructing a new decision procedure 
capable of playing the central role analogous to the one in Riesz’s lemma. 

With the general plan of the proof of the Hunt-Stein theorem as outlined in 
Section 3 firmly in mind, we now proceed to our generalization. It will be useful 
to make here the obvious generalization of Definition 5, Section 3, toan asymptot- 
ically right invariant net {u.} of probability measures on (S, ©). Readers who 
are unfamiliar with the notion of nets may continue to think of sequences. For a 
discussion of nets we refer to Kelley [7]. The modified minimax principle, it 
should be remembered, requires us to pay strict attention to the fact that the 
various permutations mentioned there never take us out of an orbit, leading to 
the sharper result of completeness. 

THE GENERALIZED HuNT-STEIN THEOREM. Let (Z, ®, 2, P, A, @, L) be a 
statistical problem invariant under a group G as previously described. If the following 
regularity conditions are satisfied, 

i) there exists a o-finite measure v equivalent to the set of all P. , 

ii) there is a topology on the action set A such that A is a separable metric space 
and such that @ is the Borel field generated by the compact subsets of A, and the loss 
function L is such that for each w ¢Q, L(w, a) is non-negative and continuous in a 
and is such that for every real number 7, the set {a:L(w, a) S 1} ts compact, 





10 OSCAR WESLER 


iii) © is a Borel field of subsets of G and there is an asymptotically right invariant 
net {ua} of probability measures on (G, ©), 
then the class &* of almost invariant decision procedures is essentially complete 
in the modified minimax sense. If in addition 

iv) § is a locally compact o-compact topological group with © generated by the 
compact subsets of G, 
then the class $** itself is essentially complete in this sense. 

Proor. Let ¢ ¢ be given. We are required to find a ¢* ¢ &* at least as good 
as y. We may assur e without loss of generality that the risks of ¢ are bounded 
on each orbit 2, by inite numbers m, , for otherwise we may clearly ignore such 
orbits when compaying a possible ¢* and have only to carry out the following 
argument for the remaining orbits. 

A) We observe first of all that for each g ¢ G, the procedure gey is equivalent to 
¢ in the modified minimax sense. This follows at once from the invariance of the 
risk function, expressed by the formula p(w, gae) = p(ga(w), ¢), which tells us 
that the values of one risk function are simply those of the other permuted within 
each orbit of 2 separately. 

We define for each a, 


ea(T |2) = | (Gi T'| ge(2)) dua(a), 
a net of randomized procedures got by .. "eraging the procedures gey with respect 


to the measures yu. on (SG, @), and shall show by a simple calculation of their risk 
functions that the ¢. are all at least as good as ¢. 


p(w, va) = | [ L(«,a) dg.(a| 2) dP.(2). 


The Lebesgue integral transformation comes to our aid here, enabling us to 
compute such an expression by writing it as 


ie [fp ga({a:L(w, a) > h} | 2) dh dP,(2) 
0 


ws / / | o(g,{a:L(w, a) > h}| ga(z)) dua(g) dh dP.(2). 


Now interchanging orders of integration (the integrand is positive) we integrate 
out with respect to h and the Lebesgue integral transformation in reverse gives us 


/ | | L(w, a) de(ga(a) | ge(z)) dPo(2) dua(g) 


s | p(w, gov) dua(g) 


= | o(oatw), ¢) dua(g) 


IA 


sup p(w’, ¢), 


we, 





INVARIANCE THEORY 


where Q, is the particular orbit to which w belongs; hence, 
sup p(w, ga) S sup p(w, ¢) for all s ¢ S, 
we, wef, 


so that the ¢. are all at least as good as ¢. 

B) Let K be the collection of compact subsets K of A. For fixed K and any 
g €®, gx is a ®-measurable function on Z to the closed interval [0, 1]. By x we 
mean the collection of all these functions gx with ¢ ¢9, i.e., the set of all ran- 
domized tests. Now the Banach space L”(y) of all bounded @-measurable func- 
tions is the adjoint of the Banach space L’(y) of all extended real-valued func- 
tions integrable with respect to v, and Alaoglu’s theorem tells us that the solid 
unit sphere of the adjoint of a Banach space is compact in the weak -* topology. 
Hence #, , as the intersection of a closed set with the solid unit sphere, is also 
compact in the weak -* topology on L”(v). By Tychonoff’s compactness theorem, 
the cartesian product of the compact sets x for all K ¢ K is a compact set in the 
product topology of these Banach spaces L”(v). Therefore our net of functions ¢« 
defined above has a cluster point ¢ in the product space. A consequence of this, 
called the cluster point condition, is that for any finite number of sets K, any 
finite number of functions f integrable with respect to v and any number e > 0, 
there is an arbitrarily large a such that 


| | 
|| ee(K | 290) dole) — [ 2K | 29) avl2)| < « 


Further, since @ is a cluster point of the net of functions ¢, , there is a subnet 
Ya, converging to it. To simplify the argument, we shall henceforth confine 
ourselves to this subnet and drop the second subscript 8, so that 2 is now a limit 
point of the {¢.}, i.e., for any compact set K and any integrable f we have 


| eelK | 292) ave) + [ 0K | dF) ae). 


Note that nothing is said about @ being a procedure, for it most certainly 
need not be. However, we have the following. By definition, every procedure ¢ 
has the properties 

1) If K,iCK2, then gx, S ¢x, ; 

2) If Kin Ke = ¢, then gx,ux, = ¢x, + x, 3 

3) ox £1; 
and it is easily seen that the limit point @ must have the same three properties 
almost everywhere with respect to v (in fact has the third property everywhere). 
For example, we show property 2). 

Let f be a function everywhere positive and integrable with respect to v (such 
an f clearly exists since v is o-finite). Define a function g by 


g(z) = f(z)[@(Kiu Ke | z) — @(Ki | z) — (Ke | 2)). 


g is still integrable, as the product of f by a bounded @-measurable function. 
Applying the limit point condition to the function g and the three sets K, u Ke, 











12 OSCAR WESLER 


K,, Ke we have 
| oles v Ka| 2) — o(Ki| 2) — 9(Ka| 2)] dole) 


— | (2)lea(Ks u Kal 2) — ga(Ki| 2) — ee(Ka| 2)] do(e) > 0. 


But the second square bracket is zero, hence rewriting g 


[ S@le(Ks u Ks | 2) — 2K | 2) — o(Ka| 2)P doe) = 0. 
But f was chosen everywhere positive, hence 


Gx,ux, = Ox, + $x,|?). 


We now show how ¢ may be used to construct a suitable procedure ¢*. 

Let ® be a countable subset of K such that ® is closed with respect to finite 
unions and finite intersections and such that every open subset U of A is a 
countable union of interiors of elements R of ® which are themselves subsets of 
U (such en & is known to exist by virtue of our hypothesis (ii) on A). Consider 
the function @ restricted to the set ®. Thus restricted, the limit point 2 has the 
three mentioned properties of a ¢ with only countably many exceptional sets of 
v-measure 0, which we may hereby disregard and assume the properties hold 
everywhere for @. 


We define an outer measure ¢* on the open sets U in A by the formula, for 
each 2, 


¢*(U|z) = sup a(R | z). 


gu is clearly a @-measurable function of z, and for each z we do have an outer 
measure by the subadditivity of the supremum. Hence, we can extend ¢* to be a 
Caratheodory outer measure by defining for every subset W of A, 


¢*(W|2z) = inf ¢*(U|z). 
UW 


gw is clearly ®-measurable, and for each z all the Borel sets T ¢ @ are measurable. 
Restricting y* to the set @, it is thus a measure on (A, @) for each z. If we can 
show, therefore, that ¢*(A | z) = 1 [»], o* will then be a randomized decision 
procedure. To do this, we first show that for every compact set K, ox = xl): 
Let U > K be any open set. Then we have seen U may be written as U = 
Ux, ev (interior of R;) which is then an open covering for the compact set K, so 
that for some n, K ¢ Uj_iR; = Re @. Hence oc S dg [v]. But R © U, so that 
ox < ¢olv]. Now write K = N; U;| the intersection of a descending sequence of 
open sets, which can be done since our space is metric. Since ¢* is a finite measure 
on the Borel sets, we have gx = limgg ; everywhere. Combining this with the last 
inequality gives ox = x{v] since there are only countably many 7 involved. 
Now consider any « > 0 and any w €@. We have for every a, p(w, ¢.) S m, 


INVARIANCE THEORY 13 
where , is the particular orbit to which w belongs. A simple consequence of this 


inequality is that for the compact set K = {a:L(w, a) S m,/e} and for every a 
there holds 


[ es(K|2) dP.) > 1 


€. 


Hence by the limit point condition, since dP.(z) = p(z| w) dv(z), we have 
[ o&|2 aP.@ 21-6 


so that by the inequality just proved, 


IV 


/ e*(K | 2) dP.(2) 21 —«, 


therefore, 


[ | a. 


IV 


1 —e. 


But ¢ is arbitrary, and as A is open, we know ¢*(A | z) S 1 everywhere; hence, 
g*(A | z) = 1 almost everywhere with respect to P.,, for each w ¢2. From the 
equivalence of » to the family of probability measures P,, , there clearly follows 
¢*(A | z) = 1 [p). 

We might mention in passing that this newly constructed procedure ¢* is a 
limit point for the net of procedures ¢, when the appropriate (weak -*) topology is 
put on the space ® of randomized decision procedures, so that we have shown 
that the set of all procedures having risk Sm, for each w, where the m, are 
preassigned finite numbers, is compact in the weak -* topology. [The appropriate 
weak -* topology for procedures is characterized as follows. A procedure ¢ is a 
limit point of a net {y.} of procedures if for every compact set K and every 
non-negative integrable function f, there holds the inequality lim sup, 
S ea(K | z)f(z) dv(z) S J o(K | z)f(z) dv(z). Since lim, f gaxfdv = f ox f dv and 
dx S ¢*x{v] and f = 0, the inequality lim. f gaxf dv S Jf o*xf dv follows.] 

C) We now show that this new procedure ¢* just constructed satisfies the 
conclusion of the theorem. 

We consider its risk function, making the Lebesgue integral transformation 
and interchanging orders of integration when necessary. 


plea) I/ o*({a:L(w, a) > h} | 2)p(z|w) dv(z) dh 


[ [ 0 = e*(a:LG, a) = hj |2)Iple|«) dole) dh 


ll 


lim (1 -_ fl ¢*({a:L(w, a) S h} | z)p(z| w) dv(z) an) 


H~o 





14 OSCAR WESLER 


lA 
lA 


lim (w aad [ lim | {a:L(w, a) S h} | z)p(z| w) dr(z) an) 


Hw 


lA 


= lim lim (1 — [ [ enlta: Le, a) S h} | z)p(z| w) dr(z) an) 
Jo 


H-es a 


= lim lim [ | [1 — ga({a:L(w, a) Sh} | 2)Ip(e | w) dv(z) dh 


< lim [/ va({a:L(w, a) > h} | 2)p(2| w) do(z) dh 


= lim p(w, ga) 


lA 


sup p(w’, ¢), 
ow’ A, 


where 2, is the particular orbit to which w belongs, hence ¢* is at least as good as 
¢ in the modified minimax sense. 

It remains to show that ¢* is an almost invariant procedure. To do this, we 
shall require the following lemma. 

Lemma. The asymptotic right invariance of {ua} implies for fixed g ¢ G that 


= { _ MG )dua(a’ 9") — dy.(g’)] = 0 


for every bounded measurable function yp of g’. 

Proor. This follows directly from the definition of integral. Without loss of 
generality, we assume |y| < 1. Partitioning the interval [—1, 1] into equal sub- 
intervals of length 1/M, we have at once for each positive integer M the in- 
equality. 


|| Vo) dua(g’ 9") — dua(9’) 
x * { fae yn -k+1\ -1\ 
"fe | ( ‘yf = v0) < +) oe) 
a 2 


ik , k+1 
— tao ° M s ¥(g') < +t = - 


For fixed M, the second square bracket approaches zero as a approaches infinity, 
hence lim sup. | f ¥(g’)[du.(g’-g ') — dua(g’)] | < 2/M. As this is true for every 
M, the lemma follows. 

Let, now, géG and a compact set K be given. We shall first show 
that a(g% K | gz(z)) = @(K | 2)[v]. To do this, it suffices to show that 


| (a(g. K 








gz(z)) — @(K | z)) f(z) dr(z) = 0 


for every integrable function f. Since @ is a limit point of the ¢, in the weak -* 





INVARIANCE THEORY 
sense, the formula for ¢. in terms of g makes this last integral 


tim { [ o(0"K | 95(2)$0) dole)ldualg’-0) — dua(9’)) 
But ¥(g’) = f o(ga K | gz(z))f(z) dv(z) is a measurable function of g’ and is 
bounded by f | f | dv, so our integral is indeed zero by our lemma. 

We now show that ¢*(g4 K | gz(z)) = ¢*(K | z)[v]. Let U, an open set, and L, 
a compact set, be given such that K ¢ U ¢ L. gp is by definition the supremum 
of the gz, all of which are less than or equal to @, , so that gy < 2:[»], thus 
permitting us to write gx < go S ¢, S ¢2|»]. Applying the group element g to 
these inequalities, we may extract the following: 


g*(g4 K | ge(2)) S (92 L | ge(z)) = o(L| 2) S o*(L | 2) [v). 


We may also extract 
e*(K | 2) S$ o(L| 2) = (94 L| ge(z)) S o*(94 L | ge(2)) (v1. 


Taking a sequence {L,} of such compact sets with K = fM L, and passing to 
the limit we get ¢*(g4 K | gz(z)) S ¢*(K|z) S ¢*(g4 K | gz(z)) [v], thus estab- 
lishing the equality [»]. 

From the nature of our topology on A and the way @ is generated by the com- 
pact sets K, this last equality extends by a standard transfinite induction argu- 
ment to every set 7 ¢@, thus establishing the almost invariance of ¢*. The 
argument goes as follows. 

Note first that @ is equal to the smallest monotone class 9N(K) containing the 
compact sets: A is o-compact (the union of a sequence of compact sets). The 
o-ring of subsets of A generated by K, containing the whole space A, is therefore 
a Borel field, thus equal to the Borel field @ generated by K. Obviously 9N(K) ¢ 
@. The proper difference K — L of two compact sets belongs to SN(K) since, 
writing L = f,, U, the intersection of open sets, we have K — L = U,(K — U,) 
which is o-compact. Therefore the class of all finite unions of these proper dif- 
ferences belongs to 9N(K). Thus (see Halmos [3] Theorem F, p. 223) S(K) 
contains a ring containing K, so that (Halmos again, Theorem B, p. 27) N(K) 3 
@, hence IN(K) = @. 

Write @o = K = {the compact sets} and for every ordinal number a let 
@.a = {all monotone limits of sets S, , where S, ¢ @s, and 8, < a}. For every 
a, Ba is well-defined. Then @ = IN(K) means that @ = Uw, Ba Where w is the 
first uncountable ordinal. Having thus characterized @, the proof proceeds by 
transfinite induction on a: Suppose our equality [v] holds for all ®g with 8B < a. 
Consider any set 7 ¢®,. Then T = lim T, where T, € @g, and 8, < a. The 
equality [v] holds for these 7,, by the induction hypothesis, therefore it holds for 
the limit 7 since there are only a countable number of sets of »-measure 0 in- 
volved. The equality [y] holding for all T ¢@., for all a, it holds for every 
T ¢ @, and ¢* is almost invariant. 

D) With our assumptions on (SG, ©) we now proceed to replace this almost 





16 OSCAR WESLER 


invariant procedure ¢* by an invariant procedure ¢** such that ¢** = ¢* [»], 
thereby completing the proof of the theorem. 

Let us return to the class ® of part (B). Since G is a locally compact topological 
group, there is a right invariant Haar measure \ on (G, ©). We define a function 
¢** as follows: For each Re ® 

i) let o**(R | z) = ¢*(R | z), for all z such that 


g*(g2 R | gz(z)) = ¢*(R|z) (Aj; 
ii) for all such z in (i), and each g, let 
o**(g4 R | g2(z)) = o*(R | 2); 


lii) let y**(R | z) = O elsewhere. 
To show that (i) and (ii) are not contradictory, suppose there existed R, , 
. ” 
21,9: with R, =[o1, Ry , 2 = giz(%), and 


o*(g4 Ri | gz(a)) = o*(Ri | 21) [A] 

*(9h Rz | gz(22)) = o*(Re | 22) [A], 
but with ¢*(R;, | 21) ¥ ¢*(R:2 | 2). Then it would follow that 

0 Ag:e*(g4 Rz | gz(z2)) ¥ o*(Re | 22)} 
Ag:¢*(ga Re | gz(z2)) = ¢*(Ri | a1)} 
= Afg:e*(g4 gis Ri | g2(giz(a))) = o*(Ri | 21)}, 

which by the right invariance of \ 
Afg:e*(ga Ri | gz(%s)) = o*(Ri | %1)} 
X(S). 


But 0 2 A(G) contradicts the fact that \ is a Haar measure, hence ¢** is not 
contradictorily defined. 
We now show, for each R ¢ ®, that 


e**(R|z) = ¢*(R|z) [p] 


IV 


Let 
V = {(z, 9):¢*(g4 R | gz(z)) ¥ o*(R | 2)} 
Vi = {g:¢*(g4 R | ge(z)) ¥ o*(R | z)} 
Vi, = {z:¢*(ga R | g2(z)) # o*(R | 2)}. 


The almost invariance of ¢* implies that »(V,) = 0 for each g. Moreover, X is 
o-finite since G was assumed o-compact, hence by Fubini’s theorem (Halmos [3| 
Theorem A, p. 147) (v X \)(V) = 0. By the same reference this implies \(V,) = 
0 [v], hence g**(R | z) = ¢*(R | z)[v]. 


Since ® is countable, we may disregard sets of v-measure 0, so that for all .., 





INVARIANCE THEORY 


z and g we have 

o**(gi, R | gz(z)) = o*(R | z) = o**(R | 2). 
Extending this equality in the obvious way to the open sets U, then to the 
compact sets K, a transfinite induction on @ exactly as in part (C) above extends 
it to all the Borel sets 7’ ¢ @, thus completing the proof of the generalized Hunt- 
Stein theorem. 

Remarks. It may prove useful in certain applications to point out an easy 
extension of the above result. Clearly, everything we have said goes through if 
is any invariant, convex and closed set of randomized decision procedures rather 
than the class of all of them. That is, under our regularity conditions, if ® is 
closed in the weak -* topology, and if for every probability measure \ on (G, @) 
and every y ¢%, the procedure g, given by —(7'| z) = f (94 T | gz(z)) dd(g) 
belongs to ®, then for each y ¢ @ there is an almost invariant ¢* ¢® which is at 
least as good as ¢ in the modified minimax sense, and so on. For example, the 
classical Hunt-Stein theorem follows from this by taking ® to be the set of all 
procedures ¢ satisfying the inequality sup..a, p(w, ¢) S a. 

The condition in hypothesis (ii) that L(w, a) be non-negative is not essential. 
We only require it to be bounded from below, and this is automatically assured 
by the compactness condition. - 

An example of theoretical interest. Let {P.} be the set of all probability distribu- 
tions on the positive integers, EZ the set of even integers, A = {0,1}, L(w,0) = 0 
if P,(Z) > 1/2 and = 1 if P.(£) S 1/2, L(@, 1) = 1 if P.(£) > 1/2 and = 0 
if P.(Z) S 1/2, an action to be taken after observing one positive integer. 

Let ¢ be a strategy, where ¢(n) is the probability of taking action 0 (saying 
“even’’) when integer n is observed. It can be shown, but only after a compli- 
cated argument, that ¢ is a Bayes solution if and only if it does “the right thing” 
for at least one integer n, i.e., if ¢(2n) = 1 or o(2n — 1) = O for at least one 
integer n. Further, all these strategies are admissible, but the class of all admis- 
sible strategies is not known. A plausible conjecture seems to be that ¢ is ad- 
missible if and only if inf, min {g(2n — 1), 1 — ¢(2n)} = 0. Thus we see that 
even in the simple case of one observation, the classes are extremely large. 

Let us consider this problem in the light of the invariance principle. Let g 
be the particular kind of permutation of the integers which, for some n, permutes 
the first n even numbers and the first n odd numbers separately and leaves the tail 
untouched. The set S of all such permutations for all n clearly forms a group, 
containing countably many elements g, which leaves the problem invariant. Let 
€ be the set of all subsets of GS. The set Z of all integral outcomes breaks up into 
two orbits under the group, which we may write as E and O for “even” and 
“odd.” The orbits 2, of 2 (closed in a suitable topology, a matter we need not 
go into here) are given by 


2, = {w:P.(E) = s} 


for all s belonging to the closed interval (0, 1]. There is an asymptotically (right) 
invariant sequence of probability measures on (G, @), as follows: Let u, be the 





18 OSCAR WESLER 


measure which assigns equal probabilities of 1/(n!)* to each of the (n!)’ elements 
g affecting the first 2n integers only, and 0 to the remaining elements of S. To 
prove the asymptotic invariance of this sequence, we simply note that given 
g € G, there is an integer nm» such that g affects the first 2m integers only, so that 
for every C ¢ @ we necessarily have pa(Cg) = un(C) for all n = m. By the 
generalized Hunt-Stein theorem, then, the problem reduces to the classical 


binomial case, say of tossing a coin once with unknown probability s of showing 
heads (even). 


5. Mixed games, and the use of previous experience. In this section we present 
a purely game-theoretic model for the modified minimax principle, and suggest 
how by means of this model and our previous experience the principle might be 
applied in practice to bridge the gap between the Bayes and minimax approach. 

Mixed Games. Let there be given a pair of games G; = (X,, Y, M;) and G, = 
(X2, Y, M2) with the same Y-space of randomized strategies available to the 
second player. Let p and gq = 1 — p be fixed probabilities with which player I 
is to play the respective games. We shall call such a setup a mixed game for 
player I. A strategy for the first player is then equivalent to an ordered couple 
(a, , 2%), 4 € X1, 2 € X2, signifying his choice of strategy depending on the 
game he must eventually play (say, after a coin with probability p of heads is 
tossed). The payoff function M is clearly given by 


M((21 , 22), y) = pMi(a, y) + QM2(22, y). 


We denote the mixed game by G = (X; K X2, Y, M). 

More generally, given G; = (X;, Y, Mi) i = 1,2, --+ and probabilities p; of 
playing the game G,(p; = 0, >>; pi = 1), the mixed game for player I is equiva- 
lent to the game G = (X, Y, M) where 


X= XX; XX3X ooo 2 = (tm, °°) EX 


is a strategy for the first player, and M(x, y) = >>: p; Mi(z;, y). 

In the most general case, we have games G, = (X,, Y, M,) with se S an 
index set, $ a Borel field in S, and P a probability measure on §. A strategy for 
player I is now a function f on S with f(s) e X, for each s ¢ S (i.e., f is a point in 
the function space X = X,.s X,) and 


M(f,y) = [ M.(f(8), y) dP(s). 


Let us now consider the minimax principle from the point of view of the 
second player. For simplicity we confine ourselves to the discrete case, although 
the argument is general. For each strategy y ¢ Y, the second player seeks to 
minimize sup, M(x, y) over y. But sup, M(a, y) = as Di SUpz,-x, M(x; , y) = 
>: pi a(i, y) is just the average with respect to the p; of the sliced up risk 
function in the modified minimax sense, where each slice is the old payoff function 
M,; for the ith game. (The point (a(1, y), a(2, y), ---) is the generalization of the 





INVARIANCE THEORY 19 


a-8 set for testing composite hypotheses.) Hence playing minimax in the mixed 
game is equivalent to playing Bayes in the sliced up game. 

The use of previous experience as a slicing principle: In a paper on the use of 
previous experience in reaching statistical decisions, Hodges and Lehmann [8] 
propose a guess at an a priori distribution \ for the states of nature on the basis 
of previous experience, and a safeguard in case the guess is incorrect. If c is the 
minimax risk, they seek to minimize f p(w, ¢) dA(w) among all procedures ¢ 
satisfying the restriction sup, p(w, ¢) S ¢ + k where k is a given positive number. 
They call such a minimizing procedure ¢gp a restricted Bayes solution (we might 
say the Bayes principle is given a minimax restriction). If \ is suspect, k is made 
small; as confidence in \ grows, k may be raised. Note that as k — o or is large 
enough, the restricted Bayes solution approaches the Bayes solution; as k — 0, 
the restricted solution approaches the minimax state (for k = 0, we have a best 
minimax procedure relative to X). 

Another way of looking at the modified minimax principle, with a view to apply- 
ing it in practice, suggests that we reverse this procedure. That is to say, to 
start with the minimax principle and then modify it gradually or greatly as 
previous experience is gathered—-to assume less and trust it completely, a sort of 
Bayesian modification of the minimax principle. Specifically, to take the most 
simple case, we might use the best of our information to break up or stratify 2 
into Q; and Q, with certain probabilities p and 1 — p attached thereto, and then 
play minimax in the resulting mixed game. (Say, a machine is known to produce 
with probability p = .95 coins whose probabilities of showing heads when tossed 
lie between .4 and .6. Q; is then the interval from .4 to .6, 22 its complement in 
the unit interval, and the corresponding games are mixed in the ratio .95 to .05. 
If the probability that w ¢ 2, is at least p, then we would mix Q, and all of 2 in 
the proportion p to 1 — p, and so on.) As more knowledge is acquired, we might 
adjust the value of p, or better still, feel confident enough to partition 2 into a 
larger number of sets with certain p; attached, or even into a family {,} of sets 
with a probability measure P over it, and then play minimax in the resulting 
mixed game. By the above, this is equivalent to playing Bayes in the sliced up 
game. If each Q, consists of only one point, we are really postulating an a priori 
probability distribution P over the states of nature, so that minimax in the 
mixed game is exactly Bayes for the original game. Thus we see how previous 
information might be used in any given problem to slice 2 into subsets, and how 
it is possible to go from the minimax extreme to the Bayes in easy stages, by a 
gradual modification of the minimax principle. 

Acknowledgment. I wish to express my deep personal indebtedness to the late 
Professor M. A. Girshick for suggesting this topic, and for his interest and 
guidance throughout my statistical studies. I am indebted to Professor Herman 
Rubin for several helpful discussions. I wish to thank the referee for pointing out 
references [9] and [10], as papers containing further useful remarks on the Hunt- 
Stein theorem and the role of invariance in Statistics; I am glad to call them to 





20 OSCAR WESLER 


the attention of the reader. To the Office of Naval Research my thanks for its 
support of this work. 


REFERENCES 


{1] David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions, John 
Wiley & Sons, New York, 1954. 

(2] Abraham Wald, Statistical Decision Functions, John Wiley & Sons, New York, 1950. 

(3] Paul R. Halmos, Measure Theory, D. Van Nostrand, New York, 1950. 

[4] G. Hunt and C. Stein, ‘‘Most stringent tests of composite hypotheses,’’ unpublished. 

[5] S. Banach, Théorie des opérations linéaires, Warszawa, 1932. 

(6] Paul R. Halmos and L. J. Savage, ‘‘Application of the Radon-Nikodym theorem to 
the theory of sufficient statistics,’”” Ann. Math. Stat., Vol. 20 (1949), pp. 225-241. 

[7] J. L. Kelley, General Topology, D. Van Nostrand, New York, 1955. 

(8] J.-L. Hodges, Jr. and E. L. Lehmann, ‘‘The use of previous experience in reaching 
statistical decisions,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 396-467. 

(9] E. L. Lehmann, ‘‘Some principles of the theory of testing hypotheses,’’ Ann. Math. 
Stat., Vol. 21 (1950), pp. 1-26. 

{10} J. Kiefer, ‘‘Invariance, minimax sequential estimation, and continuous time proc- 

esses,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 573-601. 





ON LINEAR ASSOCIATIVE ALGEBRAS CORRESPONDING TO 
ASSOCIATION SCHEMES OF PARTIALLY BALANCED 
DESIGNS 


By R. C. Bost anp Date M. Messner! 


University of North Carolina and Michigan State University 


1. Introduction. Given v objects 1, 2, --- , v, a relation satisfying the follow- 
ing conditions is said to be an association scheme with m classes: 

(a) Any two objects are either Ist, or 2nd, --- , or mth associates, the rela- 
tion of association being symmetrical, i.e., if the object a is the ith associate of 
the object 8, then 8 is the ith associate of a. 

(b) Each object a has n; ith associates, the number n; being independent 
of a. 

(c) If any two objects a and £6 are ith associates, then the number of ob- 
jects which are jth associates of a, and kth associates of 8, is pj, and is inde- 
pendent of the pair of ith associates a and 8. 

The numbers v, n; (¢ = 1, 2,--- , m) and pj, (i, j, k = 1, 2,---, m) are the 
parameters of the association scheme. 

If we have an association scheme with m classes and given parameters, then 
we get a partially balanced design with r replications and b blocks if we can ar- 
range the v objects into b sets (each set corresponding to a block) such that 

(i) each set contains k objects (all different) ; 
(ii) each object is contained in r sets; 

(iii) if two objects a and 8 are ith associates, then they occur together in \; 
sets, the number \; being independent of the particular pair of ith associates a 
and 8. 

Partially balanced designs were introduced in experimental studies by Bose 
and Nair [5], and have recently come into fairly general practical use. The con- 
cept of the association scheme, though inherent in Bose and Nair’s definition, 
was explicitly introduced by Bose and Shimamoto [6], as an aid to the classifi- 
cation and analysis of partially balanced designs. 


2. Association schemes as concordant graphs. An association scheme with v 
objects and m classes may be visualized as follows: 

Let the objects be points. Suppose we have m colors C, , C2, --- , Cn. If two 
objects are ith associates we connect them by a segment of the ith color. The 
points together with the segments of the ith color form a linear graph which will 


Received February 21, 1958. 

1 The research by the first author was supported in part by the United States Air Force 
under contract AF 18(600)-83, monitored by the Office of Scientific Research. 

Some of the work of the second author appeared in his doctoral dissertation at the 
Michigan State University [15]. He is now an NRC-NBS Post Doctoral Research Associate 
at the Statistical Engineering Laboratory, National Bureau of Standards. 


21 





22 R. C. BOSE AND DALE M. MBSNER 


be regular of degree n; as a result of property (b). We may say that the n graphs 
together are concordant’ when properties (a) and (c) are also satisfied, the mean- 
ings of these being as follows: 

a) Every pair of points is connected by a single segment of one of the m colors. 
The graphs are non-oriented. 

b) If any two points a and £8 are connected by a segment of the 7th color, then 
the number of points which are connected to a by a segment of color C; and to 8 
by a segment of color C; , is pj, and is independent of the particular pair of points 
chosen. 

Equivalently pj, is the number of 2-chains directed from a to B and consisting 
of segments of colors C; and C; in that order. Clearly the pj, are closely related 
to the number of triangles in the graph formed of segments of colors C; , C;, 
C;, . Properties (a), (b) and (c) are just enough to specify the number of segments 
of each color on each point, and the number of triangles of each combination of 
colors on each segment. The total number of segments, the total number of 2- 
chains, and the total number of triangles in the graph are then readily deter- 
mined. Methods based on the incidence matrices of the graphs [16] can be used 
with (3.6) to enumerate certain chains of more than two segments. The arrange- 
ment in these graphs of all configurations involving two points or three points 
shows a striking regularity which does not extend to configurations having more 
than three points. It can be shown by examples that the points of the graph of 
color C; may not all lie on the same number of complete 4-points, and that two 
association schemes with the same parameter values may give graphs differing 
in the total number of complete 4-points. This shows that the structure of con- 
cordant graphs is not determined completely by properties (a) to (c). 


3. Association matrices. We define?’ 


bis bis bi 
B; = (Boy) =| cee eee eee eee eee 7 
rn bi: 
where 
bb; = 1, if the objects a and £ are ith associates (or connected by a 
segment of the ith graph); 
= Q, otherwise. 


B; is a symmetric matrix, in which each row total and each column total is n; . 
Let each object be the zeroth associate of itself and of no other treatment. 


2 Not to be confused with chromatic graphs, in which points, not segments, are colored. 
For a general discussion of linear graphs, see [11]. 

3 The convention will be adopted here of using a superscript as the column index of a 
matrix, the first subscript as the row index, and the second subscript as the index of the 
matrix itself. This choice is dictated by the notation already established for the param- 
eters pix . 





LINEAR ASSOCIATIVE ALGEBRAS 


By = the v X v identity matrix, 
Nm 
0 of ° 
Dis = 1% ifi = j, 
otherwise, 
Pox ifi = k, 
otherwise, 
No for designs. 


The following identities are known [5] and can be proved easily by combina- 
torial methods. Proofs based on the matrices B; will be given in Section 5. 


m 
~ ni = 2, 
t=O 


p> Dix = Me, 
Dik = Dis, 
Ni Die = Ny Din = Me is. 
Further the following two identities hold for designs: 
bk = vr, 
yo nds = rk, 


t=O 


(3.2) 


Among the numbers 


bo, a. ***. hn 
only one is unity, i.e., b2, if « and B are ith associates. Hence 


(3.3) Bo+ Bi +-+:++ Ba =Jd,, 


where J, is the v X v matrix each of whose elements is unity. 
It also follows that the linear form 


(3.4) coBo + cB, + — + CmBm 
is equal to the zero matrix if and only if 
O=a =: =c, = 0; 


hence the linear functions of By, B;, --- , Bx» form a vector space with basis 
Bo, Bi, +++, Bm. 





24 R. C. BOSE AND DALE M. MESNER 
Lemma 3.1. 
(3.5) 2d bz; bi. = Dj Dao + eee + Dix be + ese + Dj Dems 
1 


The objects a and 8 are zeroth associates if a = 8; otherwise they are either 
Ist, 2nd, --- or mth associates from condition (a) of Section 1. Suppose they 
are ith associates. Both terms of the product b2,b%, are unity if and only if y 
is the jth associate of a as well as kth associate of 8. Hence from condition (c) 
Section 1, the left-hand side of (3.5) is pj, . Again since a and 8 are ith associ- 
ates b&, is unity if 1 = i and is zero otherwise. Hence the right-hand side of (3.5) 
is also equal to pj, . This proves the Lemma. 

We now note that the left-hand side of (3.5) is the element in the ath row and 
6th column of the product B;B, , and b%, is the element in the ath row and Sth 
column of B; (1 = 0, 1, --- , m). Thus* 

(3.6) B;By = pjeBo + pjeBi + +++ + pyrBn. 

The product of two matrices of the form (3.4), where the c; are scalars, may be 
expressed as a linear combination of terms of the form B;B, and will reduce to 
the form (3.4). The set of matrices of this form is therefore closed under multi- 
plication. It is clear that it forms an Abelian group under addition. Thus the 
linear functions of By) , B; , --- , B» form a ring with unit element, which will be 
a linear associative algebra if the coefficients c; range over a field. Multiplication 
is also commutative. This statement and the equivalent statement pj, = pi; 
will be shown in Section 5 to follow from (3.6) and the symmetry of B; . 

Linear associative algebras have of course been extensively studied and are 
treated, for example, in [13]. The properties of most importance in the present 
study are easily established, and brief proofs will now be given for the sake of 
completeness. 

We first find the consequences of the associative law of matrix multiplication: 


B(B; Bi) = Bi >> pe Bu 
= > pn BiBu 
- > Dik Diu B;. 


Also 
(BB) Be = (> pij Bu) Bx 


= > pis Bu Be 


-_ » Dis Dur Br. 


‘The fundamental formula (3.6) was first obtained by W. A. Thompson [17], [19] and 
was independently discovered by the second author [15]. Other results of Section 3 were 
included in a set of lectures [2] at the University of Frankfurt by the first author. Some of 
these were independently obtained in another form by the second author. When the two 
authors learned of each other’s work, they decided to collaborate in a joint paper. 





LINEAR ASSOCIATIVE ALGEBRAS 25 


From the independence of By , Bi, --- , Bn, 
(3.7) DX pis pur = D0 je in. 


In these equations the summation over u runs from 0 to m and the remaining 
indices are arbitrary but fixed, 
0si,j,k,tSm 


Now let us define ; by’ 


Dmk Dk - Pink 
Now the left side of (3.7) is the element in the ith row and tth column of 0;®,. 
Also the element in the ith row and tth column of @, is pi. , 80 that the right 
side of (3.7) is the element in the ith row and tth column of 
Dio + PePr +--+ + DicPm - 


Hence we have 


(3.8) O;P, = Din Po + DiePr1 + +++ + pn. 


Thus, the ©’s multiply in the same manner as the B’s. Since py = 1 if k = i 
and 0 otherwise, the Oth row of @;, contains a 1 in column k and 0’s in other posi- 
tions, which is enough to show that if 


CoP + 01 + +++ + CmPm = 0, 
then 


CGQ= = --- mc, = 0; 


1.€., Po, O1,°** , Om» are linearly independent. They thus form the basis for a 
vector space and combine in the same way as the B’s under addition, as well as 
under multiplication. They provide a regular representation in 
(m'+ 1) X (m + 1) 

matrices of the algebra given by the B’s, which are v X v matrices. In particular, 
Po = lew ° 

Since the B’s are commutative, the ®’s are commutative. In general they are 
not incidence matrices and are not symmetric. @;, does not have equal row totals, 


but has the same equal column totals n, as B; . In analogy with (3.3), all elements 
of row j of >>, ® are equal to n;. Let 


B = mBo + Bi + oe + caBrn 


5 It should be noted that these matrices differ from matrices P, = (pts) which were 


defined in several earlier papers ([4], [5], [6]) but do not have the same algebraic prop- 
erties. 





26 R. C. BOSE AND DALE M. MESNER 


be any element of our algebra, and let f(A) be a polynomial. Then we can express 
F(B) = bBo + LBi + +++ + lnBn. 
If 
P = CoPo + 1O1 + -++ + CmPm 
is the representation of B, then 
I(P) = loPo + Pi + +++ + lan. 


Let f(A) be the minimum function of B and ¢(A) the minimum function of @. 
Then f(A) is the monic polynomial of least degree for which 


f(B) = 0. 
f(B) = 0h =h = ++: =In =0>f(0) = 0; 
i.e., f(A) is divisible by (A). 
Similarly (A) is divisible by f(A). Since both are monic polynomials, 
fA) = ¢Q). 


That is, B and @ have the same distinct characteristic roots, and every matrix 
B has at most m + 1 distinct characteristic roots, which are solutions of the 
minimum equation of @. 


4. Applications to combinatorial problems. Association matrices will be used 
to derive some results first obtained in [9] by a longer method. 
The incidence matrix N = (n,;) of a design is defined by 


ni; = 1, _ if treatment 7 occurs in block j, 


= Q, otherwise. 


Then 
B= NN = rByo + Bi + -:-- + AnBa ; 
P = rOo + ALO, + 2+ + AmPm.- 
Also 
“ | ee. 
C = (a t) Bs > B k Bi 


where C is the coefficient matrix in the normal equations for estimating the 
treatment effects after adjusting for the block effects [6]. Clearly C is a sym- 
metric matrix. If e is a characteristic root of C, then k(r — e) is a characteristic 
root of B. It is known that C has rank v — 1 for a connected design [1]. In this 
case,° therefore, 0 is a simple root of C and rk is a simple root of B, a fact which 
could also be shown directly as follows. 


6 Connectedness was assumed implicitly in [9]. 





LINEAR ASSOCIATIVE ALGEBRAS 27 


The elements of B or NN’ are non-negative, and for connected designs B is 
irreducible. Also it is easy to verify that the sum of the elements in any row or 
column of B is >> na; = rk. Hence 

1 1 
* = => — ’ 
” rk - rk _ 


is a stochastic matrix, which shows that unity is a simple root of B* and is greater 
than all the other roots [7]. Thus rk is a simple root of B. The results of Section 
3 show that rk is a root of ® and exceeds the other roots. If this root is removed 
from | ® — J@| = 0, then for the case m = 2, the other two characteristic roots 
of @ will be roots of a quadratic equation which reduces to 


(r — 6)? + [(A1 — A2)(piz — pia) — (Ar + A2)I(r — 9) 
+ [Qi — 2) (Aspis a pie) + Arde] = 0, 
on using the identities (3.1) and (3.2). The roots are given by 


r—h = (1 a 2) (—¥ ~ V/A) + en + r2)], 
(4.1) 


r — b = $[(. — s)(—y + VA) + Or + 9), 

where 
Y=Pu-pe, B=Putpr, A=7¥ +2841. 

Therefore, 

| NN’ — I,6| = (rk — 6)(6; — 0)"*(6. — 6)**. 
To determine the multiplicities a, and a, we note that 

Trl,= 1+ a+ a =», 
Tr NN’ = rk + a0; + ab, = vr. 

Solving and using (4.1), 


m+ Me _ (n1 — me) + v(m + me) 
2 2/A 


a= 


(4.2) 


-mtm, (m — m) + y(m + nm) 


2 2V/A 

Thus the multiplicities a, and az of the roots of NN’ are determined in terms of 
the parameters of the design. It is striking that, being independent of r and ); , 
these multiplicities are the same for all designs having a given association scheme. 
This is an instance of some general properties of a; to be established in Section 6. 

For a design to exist, a; and a, must be integral. The condition this imposes on 
the parameters appearing in (4.2) has been used in studies of the existence and 
non-existence of designs [3], [8], [9], [12], [15]. 





28 R. C. BOSE AND DALE M. MESNER 


5. Applications of algebraic properties of association matrices. In this sec- 
tion we assume only that B; (i = 0, --- , m), are symmetric incidence matrices 
satisfying 


(5.1) By = Res 
(5.2) dX B; = J, 
(5.3) B;B, = 2d pix Bi, 


for some set of constants pj, . All of the properties of the algebra except commu- 
tativity follow immediately, including its representation in terms of the matrices 
©, = (pix). Also, pj, are elements of products of incidence matrices and must be 
non-negative integers. From 


By = BoB, = Dd. pu B; 
we deduce the special values 
Pu = 1, if i = k, 
= 0, if ¢ x k. 
The diagonal element in row ¢, column ¢ of B;B, may be interpreted as the 
number of positions occupied by 1’s in row ¢ of B; as well as in row ¢ of B,. 
(5.2) shows that if k # 7 this element is zero. If k = j it is equal to the number 


of 1’s in row ¢ of B; . The expansions of B;B, = B;B, and B 5B; = Bj then show 
that 


pie = 0, jk, 


and that D} j 18 equal to each row total of B; . These row totals must therefore 
be equal. As a matter of notation set 


0 
Ps Mis 
Row totals in (5.2) show 


> n; = v. 


Also 
(2 B;)B, = Jp Be = mJ, = » mn B;, 
and 
x (B; By) = x x pix Bi = » (2 pin) By. 
Hence comparing coefficients, 


> Pie = %. 
3 





LINEAR ASSOCIATIVE ALGEBRAS 


We now show that commutativity follows from symmetry of B; . 
B, B; = B,B; = (By By)’ = (DX pe Bi)’ 


=D pie Bi = DL px By = ByBe. 
As a consequence, 
Piz = Dis - 
We also deduce 
P;P~e = Ox0j. 

Equating the elements in the sth row and tth column of 0; and 0.0; , 
(5.4) Do Pei Dik = Dy Pek is « 
This relation is equivalent to (3.7). Taking ¢ = 0 we get 

>> Dis Pik - » Pik Diss 


Me Dai = Nj Dir. 
We have now shown that all the known identities (3.1) follow from the prop- 


erties of the algebra which were stated at the beginning of this section. However 
the relation 


P5Px = Ox.0; 
leads to new identities when m > 2. 
To prove a new identity in the case m = 3, set j7 = s = 
(5.4), giving 
m + pups + pupss + Pups: = PPu + Pipn + Pip - 


We remark that when m = 3, other choices of j, k, s, t lead to relations equiva- 
lent to this one. The use of a smaller number of parameters will make it easier 
to recognize equivalent expressions and will be helpful in simplifying the iden- 
tity. A fairly symmetric set of parameters is the following: 


Nea; Ns; 
1 2 
MP2 = MP, 
3 1 
= WMPu = MP, 
Sasi —_— 3 
= N2p3s = NsgP2 , 
= Ms Oe ea 3 
= MP2 = MPu = WPr. 


Known identities can be used to express all pj, in terms of these parameters, 
whereupon the above identity reduces to 





30 R. C. BOSE AND DALE M. MESNER 


z 24h 42) 4 0(%+ 24% m—m—m-1) 
(5.5) ™m ™ Ne mH ™ Ng 


Qs a a a 

+ (0% Dg Se ge SOS cass he On + mma ns) = 0. 
mM Ne Ns 

Thus when 7, , % , 3 , Giz , Ges , A, ATE Given, x must satisfy a quadratic equation. 

This is a new relation, since known identities (3.1) do not determine z in terms 

of the other chosen parameters. An example will illustrate this. Let 


n; = 8, a;; = 24. 
Then sets of pj, which satisfy (3.1) are obtained for 
a = 8, 16, 24, 32 or 40. 


However, (5.5) becomes 
3a°/8 — 16x + 152 = 0 


and has no integral solutions, showing that the parameter values n; = 8, a;; = 
24 are impossible. 

An equivalent and perhaps easier way to impose the new necessary conditions 
on a given set of parameters is to form the matrix products 0;®, and 00; 
and require that they be identical. 

The property of symmetry in the matrices B; was used in the proof of com- 
mutativity in the algebra, which has been of key importance in the proofs of 
several of the foregoing identities. The fact that the elements of B; are 0’s and 
1’s has been used in determining the special form of the pj, values but has not 
been vital in the algebra or the identities relating pj, . The simple example 


1 0 0 a 00 1 
Bo =|0 1 O|, By, =|1 -1 1}, B,=|0 1 0 
i oe” 1 0 0 


shows that matrices with elements other than 0’s and 1’s may have the same al- 
gebraic behavior as association matrices and may lead to the same identities. 
This shows the necessity of the word “incidence” in the following lemma, which 
summarizes several results of this section. 


Lemma 5.1. If B;, i = 0,1, --+ , m are symmetric incidence matrices satisfying 
Bo = iy ; 
p B; - J, ’ 
t=0 


B; B. = 2 pix Bi 
for some set of constants pj. , then B; are the association matrices of an association 
scheme satisfying (a) to (c) of Section 1. 





LINEAR ASSOCIATIVE ALGEBRAS 31 


This lemma provides a useful algebraic method of verifying whether a given 
association relation satisfies the conditions of partial balance. 

Algebraic sufficiency conditions may be used for designs as well as association 
schemes. It is easy to verify that an incidence matrix N is the matrix of a PBIB 
design if and only if N has equal column totals and 


NN' = rBo + vB, + 29 + a 


for some m and some numbers r, \;, --: , Am, Where By, B,,--- , By satisfy 
the conditions of Lemma 5.1. An application of this Lemma will be made in the 
proof of the next theorem. 

Given an association scheme @ with more than m classes, let the indices of the 
associate classes be arranged into disjoint sets So = {0}, S:,--- , Sa. Define a 
new association relation @ in which associate classes correspond to sets S;, 
two treatments being defined as ith associates in @ if and only if the associate 
class of the two treatments in @ corresponds to one of the indices in set S;. 

Association relations obtained in this way do not in general satisfy the condi- 
tions of partial balance. Lemma 4.1 of [18] states necessary and sufficient condi- 
tions for partial balance in the case S, = {1, 2}, S; = {¢ + 1}, 7 = 2, ie., the 
case in which just two classes are combined. Iteration may give schemes in which 
several classes have been combined. However, examples are known [15] in 
which a combination of 3 or more classes will give a new scheme with partial 
balance while every combination of 2 classes fails, so that the iterative procedure 
is impossible. The following generalization is therefore nontrivial. 

THEOREM 5.1. Given an association scheme @ with v treatments and parameter 
values ggy, let an association relation ® with v treatments have classes 0, 1, --- , m 
determined by disjoint sets So = {0}, Si,---, Sm of indices of @. In order for 
® to satisfy the conditions of partial balance it is n.a.s. that there exist constants 
pix such that 
(5.6) LL ay = Din 


BeS8; yes, 


uniformly for a e S;, and for i,j, k = 0,1, --- , m; in this case ® has parameter 
values pjp « 

Proor. We denote incidence matrices of @ by A. and of ® by B;. From the 
definition of ®, 


Lemma 5.1 will now be applied. 

Clearly B; are symmetric incidence matrices, By = 7, and a B; = J; 
in order for @ to have partial balance it is thus n.a.s. that there exist constants 
pix such that 


B; Be = 2 pix Bi. 





oo R. C. BOSE AND DALE M. MESNER 


Substituting, 


B; Be = ( 2) As)( 2) Ay) 
Bes; yes, 
=>) LY A44,= 2 DD wy Aa 
Bes; ves; BeSi yveSp a 


= » (2d D %7)Ae- 


e 8; vyeS, 


Also 
a pix B; = 2 Pin yA. 


m 
> Dd pede; 

im0 ae 8; 

the coefficient of A, in this expression has the same value pj, for every ae S;. 
Kquating to the coefficient of A. in the previous equation we obtain (5.6) as 
the n.a.s. conditions on the parameter values q3, of ®, completing the proof of 
the theorem. 


6. Characteristic roots of matrices in the algebras. The procedure used in 
Section 4 to determine the multiplicities a; and ae is readily generalized to associa- 
tion schemes with m classes. If @ is a characteristic root of B = > 7-0 ¢:B;, 
where c; belong to the field of real (or complex) numbers, then 6” is a characteris- 
tic root of B". Also, the trace of any matrix is equal to the sum of its charac- 
teristic roots. This leads to a system of equations in the roots 6, of ® = a ane C0; 
and the multiplicities a, of the same roots of B. 4 will designate > 0 cn, 
the common value of the row totals of B. 


a +a +°:: + am = Tr I, 
a9 + a4, + 6,2 + mbm = Tr B, 
(6.1) ao) + a6; + +++ + mdm = Tr B’, 


ap + c6f + +++ + nbn = Tr B”. 


IXquations of this form were used in [9] but were limited to the cases m S 4 
because of the difficulty of computing Tr B” with methods then available. 
(3.6) may be used to express B" in the form 


i 6 = Co.nBo + Ci.nB, + vee + i . 
Then, since By is the only B; with non-zero diagonal elements, 
Tr B* = voon. 


The right members of the equations are therefore easily computed. The coeffi- 
cients of a, form the Vandermonde matrix with determinant 





LINEAR ASSOCIATIVE ALGEBRAS 


II (0. 7” 6;). 
Osji<kam 
The system will therefore have a unique solution if and only if the m + 1 roots 
6,, are distinct. It will be shown in Corollary 6.2 that this will be the case for at 
least some choice of ¢; . 

The solutions a; must be non-negative integers. If they can be expressed in 
terms of the parameters c; and pj, this requirement will provide necessary con- 
ditions which the parameters must satisfy in order for matrix B to exist. An 
explicit solution of (6.1) requires a general solution of the equation | 6J — @| = 
0, which may be difficult to obtain for m > 2, but one observation may be made 
at once. If the basis matrices B; exist, then matrix B will exist for arbitrary 
values of c; , with characteristic roots which obviously occur with integral multi- 
plicities. This indicates that the integral nature of a, must be independent of 
c; and dependent only on p}, . Theorem 6.3 will show that this holds not only 
for the integral property but for the exact values a, . This is somewhat surprising 
in view of the form of (6.1), since the values @, and Tr B” depend strongly on 
c;. The other theorems of this section will give further insight into the nature 
of the roots 6, and multiplicities a, , as well as simplifying their computation. 
Results related to some of these have been obtained independently and by a 
different approach in [10]. 

It was pointed out in Section 4 that a9 = 1 for the matrix NN’ if it is irredu- 
cible (which is the case when the design is connected). The same theorems for 
stochastic matrices [7] apply to any B with non-negative coefficients c;. In 
particular any matrix B; which is irreducible has n; as a simple root. It follows 
from theorems (6.1) through (6.3), which we now proceed to prove, that 4 
is a simple root (i.e., ag = 1) for any set of coefficients c, for which 

B= 2 c;: B, 
is irreducible. 


o 


THEOREM 6.1. Let the characteristic roots of @; be zu;, u = 0,1, +--+ , m. Then 
for a suitable ordering of zu; for each 1, the characteristic roots of the matrix 


C= > ¢ 0; 


i=0 


are given by 


(6.2) Ou 7 2 Ci Zui, 


i=( 


Proor. The matrices ®),---, @m are pairwise commutative. Frobenius’ 
Theorem ({14], Theorem 16.1) then states that for a suitable ordering of the char- 
acteristic roots of z,; of each @; , and for any rational function 


f(x 9 °°* 5 Lm) 





34 R. C. BOSE AND DALE M. MESNER 


the roots of 


J(Po, +++» Pm) 
are given by 


Situ, *** 5 Mad) u=0,1,---,m. 


Also, the ordering of the roots is the same for every rational function f. The re- 
quired theorem follows by taking 


f(x, Pei » Rel = Pe 
i=0 


Coro.uary 6.1. The distinct characteristic roots 9., of 


B=) cB; 
t=O 
are given by 
0 = > ej eui, u=0,1,---,m. 
t=0 


The problem of finding @, is therefore solved if the values z,; can be found and 
ordered. When they are ordered as specified by Theorem 6.1, we define the matrix 


Z = (2u3). 


Since z,; are the characteristic roots of real symmetric matrices B; , Z is a real 
matrix. 

THEOREM 6.2. Z = (2,;) is non-singular. 

Proor. Let 


Yo, Yi, *** > Ym 


be a real solution of the system of homogeneous equations 


m 
(6.3) - Zui yi = 0, u=0,1,---,m. 
t=0 
This system has coefficient matrix Z and will have a non-zero solution if and only 
if Z is singular. Since Z is real there is no loss of generality in taking y; real. 
By Corollary 6.1 the characteristic roots of the matrix 


m 
B=)>> yB; 
i=( 
are given by the left side of (6.3) and are therefore all equal to zero. The sum of 
all products of roots taken s at a time is thus equal to zero, s = 1, 2, --:, 0; 
this sum is equal to the generalized trace Tr, B, the sum of all s X s principal 
minor determinants of B. B is symmetric with diagonal elements yo and other 
elements y , --: , ¥m- This follows by noting that among the incidence matrices 
By , Bi, --- , Bm there is one and only one, say B; , for which the element in the 
tth row and uth column is unity, whereas for B; , 7 ¥ i, the corresponding ele- 





LINEAR ASSOCIATIVE ALGEBRAS 35 
ment is zero. Hence B will have y; in positions which correspond to unities of B; . 
In particular the diagonal elements of B will all be yo. Therefore 
Tr, B = vy = 0, 
giving 
yo = 0. 


Any element y; (i = 1, 2, --- , m) contributes yi — y{ or —y; to Tr: B. Since 
each row of B; sums up to n,;, the number of unities above the diagonal in B; 
s vn;/2. This is also the number of y,’s above the diagonal in B. Hence 


Tr. B = —i(myit my + +++ + Xm Ym) = 0. 


Since v, nm, --* , %m are positive integers and y , yz, *-* , Ym are real it follows 
that 


Yi = Y2 Tj ecco = Ym = 0. 
Therefore (6.3) has no non-zero solution and Z is non-singular. 


CoROLLARY 6.2. Given a set of association matrices By, B,,---, Bn, any 


ordered set of real (or complex) numbers 0, +-- , 9m is the ordered set of distinct 
characteristic roots of 


B = coBo + 1B, + a + CaBea 
for a unique sei of real (or complex) coefficients c; . In particular, matrices B exist 
with m + 1 distinct roots. 


Proor. For arbitrary % , --- , 9. the system (6.2) can be solved uniquely for 
Co,°** , Cus s 
THEOREM 6.3. If 


B = Bo + cB, +--+ + caBe 


is an element of an algebra with the association matrices B; as basis, then 


|\el — B| = I (0 — 6,)™, 
where a, are independent of co, --- , Cm. 
Proor. Let S be an element of the algebra which has m + 1 distinct charac- 


teristic roots. Then S does not satisfy any polynomial equation with degree less 
than m + 1. 


= Bo = Ff 
= bo Bo > by Bi tee t+ b, iB» ’ 
boeBo + bypB, + ” Fa + bm2B m + 


= bomBo + DimBy + ore buaBe . 





36 R. C. BOSE AND DALE M. MESNER 


Since S does not satisfy any equation of degree m or less, these equations must 
be independent and can be solved to give each B; as a linear expression in S’ 
with constant coefficients. Hence any arbitrary element B can be written 


B = dl + d4,S + --- + d,S”. 


If @ is a characteristic root of S, then the corresponding characteristic root of 
B will be 


9 = dy + did tooo + dnd”. 


All of the roots @, of B may be obtained in this way by using all of the roots of 
S. If a root ¢, has multiplicity a, , then the corresponding a, roots of B will be 
equal. That is, the roots 6, of an arbitrary matrix B have the same multiplicities 
a, as the corresponding roots ¢, of the fixed matrix S and are therefore inde- 
pendent of the coefficients ¢; occurring in B. 

This completes the proof but an additional remark should be made. The ele- 
ment B may be such that distinct roots ¢ lead to the same value 6, whose mul- 
tiplicity & will be equal to the sum of two or more a, . In general, if M is a subset 
of the set 0, 1,---, m, 6, = 6 for we M, and 6, ¥ 6 for uw z M, then 


o™ Doweme ay - 


The statement of the theorem is correct whether 6, are distinct or not. 

If the roots z,; are obtained separately for each @; , it may not be immediately 
clear what ordering of them is required by Theorem 6.1. However, each z,; is a 
root of B; with multiplicity a, . If the multiplicities are known, a suitable order- 
ing will then be determined by any ordering of the a, if the a, are distinct, and 
partially determined if they are not all distinct. Theorem 6.5 will give another 
technique for ordering the roots. Theorem 6.4 reveals another significance of the 
distinctness or equality of the a, . 

THeoreM 6.4. If t and only t of the multiplicities a, are equal, then for each i 
the corresponding roots z,,; satisfy a monic polynomial equation with integral coeffi- 
ctents and degree t. In particular, if any a, ts distinct from the other multiplicities, 
the corresponding roots z,; are rational integers. 

Proor. The term m-polynomial will denote ‘‘monic polynomial with integral 
coefficients.’’ The characteristic polynomial of a matrix with integral coefficients 
is an m-polynomial. Denote the characteristic polynomials of a basis matrix B; 
and its representation @; by 


m 


| el — B;\ = II (@ — a’ 


u=0 


ll 


f (0) 


m 


jal — &| = [] @ — z,). 


u=0 


Il 


$:(8) 


lor a particular root z,; , let g(@) be the m-polynomial of lowest degree s with 
Zui a8 a zero. g(@) is irreducible over the rational field. It is determined uniquely 
by any of its zeros and any m-polynom...' which has any of its zeros is divisible 
by g(@) and has all of its zeros ({13], Sec. 38). Therefore f;(@) and ¢;(@) are divisible 





Pie sa 


on 





LINEAR ASSOCIATIVE ALGEBRAS 37 
by g(@), which must be the product of s of the linear factors of ¢;(@). Moreover, 
the corresponding multiplicities must all have the same value a,-; otherwise, 
after a certain number of successive divisions of f:(@) by g(@) the quotient would 
be an m-polynomial which has some of the zeros of g(@) but not all. In short, 
f(@) contains [g(@)|*"’ as a factor and at least s of the multiplicities are equal. 
It may happen that the set of distinct irreducible factors with multiplicity a, 
includes others along with g(@). The product of the factors in the set will be the 
polynomial of degree ¢ described in this theorem, where ¢ is the sum of the de- 
gree s of g(@) and the degrees of any other factors in the set. Clearly s S ¢. 
If ¢ = 1, then s = 1 and g(@) = 6 — z,.;. Since g(@) has integral coefficients, 
it follows in this case that z,; is an integer, 1 = 0, --- , m. 

Theorems 6.1 and 6.4 are illustrated in the case of m = 2 associate classes by 
expressions (4.1) for the roots 6; and 6 , the roles of co , c , cz being played by 
r, \:, Ax. Although in general the roots of a quadratic equation are irrational 
functions of the coefficients and although \; and d, occur several times in the 
coefficients, the roots in this case are linear polynomials in r, A; , Ax , with coeffi- 
cients that are rational if and only if the integer A is a perfect square. It is shown 
in [9] that if a; # a: it is in fact necessary that A be a perfect square, implying 
that the roots are rational. The additional fact that they are integers is not ob- 
vious from (4.1). It is further shown in [9] that if a, = a, it is possible that A 
will not be a square and that the roots will be irrational. This is precisely the 
case in known designs of cyclic type. 

THEOREM 6.5. For fixed u = 0, --- , m, the roots z,; satisfy the relations 


mm 
(6.4) 2uj Zuk = Pa Dik Zui 
i=0 


Proor. The relation is proved by applying Frobenius’ theorem to both sides 
of (3.8). 

It is interesting to note the amount of simplification that has now been made 
in the study of a matrix of the algebra, for example the matrix 

NN’ = rBo + MB + +++ + AmBu 

of a design. The characteristic equation of this matrix is of degree v. The regular 
representation introduced in Section 3 reduces its solution to the solution of an 
equation of degree m + 1. The theorems of this section show that the charac- 
teristic roots are linear combinations of r, A: , «> , A» and that the multiplicities 
are entirely independent of these parameters, depending only on the association 
scheme. The coefficients of r, \; , --- , Am are Zu; , the characteristic roots of the 
matrices @;, which also depend only on the association schemes. In some cases 
the z,; can be shown to be integers and in any case they satisfy the system of 
quadratic equations (6.4). Once z,; values are found for some of the matrices 
0, +++ ,@m, the equations (6.4) may be particularly useful, not only permit- 
ting an easy determination of the remaining z,;, but giving them in the order 
required by Frobenius’ theorem and used in Theorem 6.1. 

The matrix Z = (2,,;) seems deserving of further study. As an indication of 
its usefulness we make the following remark: 








R. C. BOSE AND DALI: M. MESNER 


m 
Zz, Ly Zui = Tr B; = 0, i => 0, 


u=(0 


= 0, += 1,2,---,m. 


This is eo uivalent to the system of equations 


(a0, @1,°°*, @n)Z = (v, 0, 0, --- , 0) 


providing an alternative to (6.1) for determining a. . 
The authors are (hankful to W. 8. Connor and Karl Goldberg for several stimu- 
lating discussions during the preparation of this paper. 


(1! 


{7} 


(8) 


9} 
(10) 
(2) 
(12) 
(13] 


(14) 
(15) 


16] 
(17) 


[18] 


REFERENCES 
R. C. Boss, ‘‘Least square aspects of analysis of variance,’’ University of North 
Carolina, Institute of Statistics Mimeograph Series No. 9. 
R. C. Boss, ‘“Versuche in unvollstindiger Blécken,’’ Gastvarlesung Universitat 
Frankfurt/M., Naturwissenschaftliche Fakultaét, 1955. 
R. ©. Bose anp YW. H. Ciatworrtuy, “Some classes of partially balanced designs,”’ 
Ann. Math.. Stat., Vol. 26 (1955), pp. 212-232. 


’ R. C. Bosg, W. H. CLrarworrny anp 8. 8. Ss:gtKHANDE, ‘‘Tabies of partially balanced 


incomplete block designs with two associate classes,’’ North Carolina Agricul- 
tural Experimental Station Technical Bulietin No. 107, 1954. 

R. C. Bose anp K. R. Nair, “‘Partiaily balanced incomplete block designs,’’ Sankhyd, 
Vol. 4 (1939), pp. 337-372. 

R. C Boss anv T. Surmamoro, “Classification and analysis of parti: lly balanced 
incomplete block designs w:th two associate classes,’’ 1. Amer. Stat. Ass2., 
Vol. 47 (1952), pp. 151-184. 

4uraep Braver, “Limits for the characteristic roots of a matrix. IV: Applications 
to stochastic matrices,’’ Duke Math. J., Vol. 19 (1952), pp. 75-91. 

W. H. Cuatworray, ‘Contributions on partially balanced incomplete block designs 
with ¢wo associate classes,’’ Natiou.al Bureau of Standards Applied Math. Series 
No. 47, 1956. 

V.. 8. Connor ano W. H. Crxtwortuy, “Some theorems for partially balanced de- 
signs,”’ An:.. Math. Stat., Voi. 25 (1954), pp. 100-112. 

E. C. Dave anv K. Gouoperea, “Incidence algebras,’’ to be published. 

D. k.onia, Tiworie der Endlichen und Uxendlichen. Graphen, Chelsea, 1950. 

R. G. Lans anv J. Roy, ‘‘Two associate partially balanced designs involving three 
rep'i.ations,”’ Sankhyd, Vol. 17 (1956), pp. 175-184. 

>. ©. MacDurree, An Introduction to Abstract Algebra, Wiley, 1940. 

C. C. MacDurresg, The Theory of Matrices, Chelsea, 1946. 

D. M. Mesner, ‘“‘An investigation of certain combinatorial properties of partially 
Lalanced incomplete block experimental designs and association schemes, with 
a detailed study of designs of Latin square and related types,’’ unpublished 
doctoral thesis, Michigan State University, 1956. 

I. C. Ross anp F. Harary, “On the determination of redundancies in sociometric 
chains,’’ Psychometrica, Vol. 17 (1952), pp. 195-208. 

W. A. Tuompson, Jr., “On the ratio of variances in the mixed incomplete 
block model,”’ unpublished doctoral thesis, University of North Carolina, 1954. 

M. N. Vartak, ‘On an application of Kronecker product of matrices to statistical 
designs,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 420-438. 


{19] W. A. Tuompson, Jr., ‘‘A note on P. B. I. B. design matrices,” Ann. Math. Stat., Vol. 29 


(1958), pp. 919-922. 





ON A CHARACTERIZATION OF THE TRIANGULAR 
ASSOCIATION SCHEME! 


By 8S. S. SHrikHANDE 
University of North Carolina 


1. Introduction. A partially balanced incomplete block design with two 
associate classes [1] is said to be triangular [2] if the number of treatments is 
v = n(n — 1)/2 and the association scheme is an array of n rows and n columns 
with the following properties: 

(a) The positions in the principal diagonal are blank. 

(b) The n(n — 1)/2 positions above the principal diagonal are filled by the 
numbers 1, 2, --- , n(n — 1)/2 corresponding to the treatments. 

(c) The array is symmetric about the principal diagonal. 

(d) For any treatment x the first associates are exactly those treatments which 
lie in the same row and same column as z. 

It is then obvious that in the notation of [1] 

(1) the number of first associates of any treatment is n; = 2n — 4; 

(2) with respect to any two treatments x; and x, which are first associates 
(denoted by (x; , 22) = 1), the number of treatments which are first associates of 
both x, and 2 is 


pir(ar , %) =n — 2; 


(3) with respect to any two treatments x; and x, which are second associates 
(denoted by (x3; , x4) = 2) the number of treatments which are first associates of 
both x3 , 24 1s 


Pi(ts, 24) = 4. 


In an interesting paper Connor [3] has shown that if n 2 9, (1), (2) and (3) 
above imply (a), (b), (c) and (d), i.e., the association scheme is triangular. In 
this paper we derive a theorem and utilize it to prove that Connor’s result is true 


for the cases n = 5, 6. 


2. A characterization of the triangular association scheme. We prove a 
theorem which is equivalent to the theorem proved by Connor [3]. 

THEOREM: A necessary and sufficient condition that a partially balanced in- 
complete block design with two associate classes for n(n — 1)/2 treatments with 
pila , %) = n — 2, where (x; , %2) = 1, has a triangular association scheme, is 
that all the first associates of any treatment x whatsoever can be divided into two 

Received July 19, 1958. 

1 This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command, under Con 
tract No. AF 18(600)-83. Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 


39 








40 S. S. SHRIKHANDE 


sels (yi , Yo, °** » Yn—2), and (2%, 22, °** , Zn-2) such that (y;, yj) = (2, 2;) = 
1 fori = j = 1,2,---,n— 2. 

Proor. Neressity is obvious. We now prove the sufficiency. 

Since y; has (n — 3) first associates y; and pis(z, y;:) = n — 2, y; has just one 
treatment from the other set, say z; such that (y;, z;) = 1 and (y;, z;) = 2 for 
j # t. Now suppose that (y;, , z;:) = 1 for 4, # 7. Then z; has y;, y;, and z;, 
j 7 for its first associates giving the value pi:(z, z;) = n — 1 which is a con- 
tradiction. Hence we can pair off tie treatments of the two sets such that 


(yi,2:) = 1, (yi, 2s) = 2, t#j=1,2,---,n—2. 


We will use this fact repeatedly. Further it is obvious that if the first associates 
of any treatment can be divided into sets as above, this division into two sets can 
be done in a unique manner. 

For simplicity let us assume that the first associates of 1 are given by the two 
sets, 


nm SS --+, (n—1)) and 
(nj, n+1, ---, (2n — 3)), 


where any two treatments in the same column are first associates and two treat- 

ments from different columns are secund associates. We will :dopt this 1.cthod 

of writing to indicate the relationship of the two treatments from different sets. 
We now write the rows, 


*i3 3 CC a H 
L*e#nn+ttil «++ (2n — 3), 


where * denotes that the corresponding position is blank. 
Now anongst the treatments occurring so far the first associates of 2 are 
1,3,4, ---,n— landn. Let the remaining first associates be (2n — 2), (2n — 1), 
- , .3n — 6). Assume without loss of generality that 


(3, 2n — 2) = (4, 2n — 1) = --: = (n — 1, 3n — 6) = 1; 


then we form the third row by putting 2, n and * in the first three positions 
respect vely and placing (2n — 2), (2n — 1),---, (8n — 6) below 3, 4, ---, 
(n — 1) respectively. Thus, we have the three following rows: 


») 2.23 4 ++ (n— 1) 
le nntl n+2 <-++ (2n — 3) 
2n * 2n—2 2n—1 --- (3n — 6). 


We note that * occurs in the principal diagonal positions and the array written so 
far is symmetric and that the new first associates of 2 are written after the position 
of the « in the third row. 

Now consider the first associates of treatment 3. The only treatments till now 
which are first associates of 3 are 1, 2,4, ---,n — 1,n + 1, 2n — 2. Let the 
remaining (n — 4) first associates be (8n — 5), (8n — 4),--- , (4n — 10). The 





TRIANGULAR ASSOCIATION SCHEME 


two sets of first associates of 3 are 


(1, 2 4, -+» (n — 1)) 
and 


(n+1, 2n—2, 3n—5, 3n—4, --- (4n — 10)), 


where we can assume without loss of generality that the treatments in the same 
column are first associates. We now write down the fourth row to give 


1 2 3 1 ++ (n— 1) 

. n n+l n+2 <-++ (2n — 3) 

n + 2n —-2 2n—1 -++ (3n — 6) 
3 n+l 2-2 *«* 3n—5 +--+ (4n —10). 


The same method can be used to write down the other arrays corresponding to 
4,5, ---, (nm — 1) respectively. It is easy to see that all the positions above the 
principal diagonal are filled in with the numbers 1, 2, --- , n(n — 1)/2 occurring 
just once. Thus, conditions (a), (b) and (c) are satisfied. Further, any treatment 
z occurs just in one position above the principal diagonal, say in row 7 and column 
j(# i). Then, it also occurs in row j and column 7. Hence, the first associates of x 
are all the treatments of row 7 and all the treatments of row 7. By symmetry the 
treatments of column j are exactly those occurring in row j. Hence, the first 
associates of x are exactly those treatments which occur in the same row and 
same column as x. Thus (d) is also satisfied. Hence the association scheme is 
triangular. This completes the proof. 

As stated previously, this theorem is equivalent to one given by Connor [3]. 
In the present form, however, it is more directly useful. 


3. Uniqueness of the triangular scheme for n = 5. Lemma 1. The first associ- 
ates of any treatment whatsoever for the design with parameters, 


(3.1) v= 10, m = 6, N, = 3, 


) otis 2 (4 * 
pay = 1)’ Pii = 0)’ 


can be divided into two sets of three each such that any two treatments of the same set 
are first associaies. 

Proor: Assume that first associates of 1 are 2, 3, 4, 5, 6, and 7 of which 3, 4 
and 5 are first associates of 2, and 6 and 7 are second associates of 2. We then 
have 


(1, 2) = (1, 3) = (1, 4) = (1, 5) = (1, 6) = (1,7) = 1 

(2, 3) = (2,4) = (2,5) = 1 

(2, 6) = (2,7) = 2. 
We show that (6, 7) = 1. Suppose not, then (6,7) = 2 and 2 is second associate 
of both 6 and 7 contradicting p2.(6, 7) = 0. Thus, we must have (6, 7) = 1. 





42 8S. 8. SHRIKHANDE 


Consider the pair (1, 6) = 1. 7 is first associate and 2 is second associate of 6. 
Hence 6 has two first associates and one second associate from the set (3, 4, 5). 
Assume that (6, 4) = 2 and hence, (6, 3) = (6,5) = 1. Similarly, 7 has two first 
associates and one second associate from the set (3, 4, 5). We show that 4 cannot 
be second associate of 7. For if (7,4) = 2, then (6,7) = 1 and 2 and 4 are com- 
mon second associates of both 6 and 7, contradicting p22(6, 7) = 1. Hence, we 
must have (7,4) = 1, and hence 7 has one first associate and one second associate 
from the set (3, 5). We can assume without loss of generality that (7, 3) = 2, 
(7,5) = 1. Then we have the set (5, 6, 7) such that any two treatments of the set 
are first associates. Now we consider the set (2, 3, 4). We already know that 
(2,3) = (2,4) = 1. We nowshow that (3, 4) = 1. Nowthe common first associ- 
ates of 1 and 5 are 2, 6, 7. Hence, 3 and 4 are second associates of 5. Hence, we 
must have (3, 4) = 1 in the same way as we obtained (6, 7) = 1 above. Thus, 
(2, 3, 4) is another set of first associates of 1 such that any two members of the 
set are first associates. A similar result is true for any other treatment. This 
proves the Lemma. 

An appeal to the theorem now gives the corollary: 

Coro.uary. A partially balanced design with parameters (3.1) has triangular 
association scheme. 


4. Uniqueness of the triangular association scheme for n = 6. Lemma 2. The 
first associates of any treatment whatsoever for the design with parameters, 
(4.1) y = 15, nm = 8, nN, = 6 


1 (4 3 2 (4 a 
Pi = 3)? Pii = 1)” 


can be broken up into two sets of four each such that any two treatments of the same 
set are first associates. 

Proor: Assume without loss of generality that the second associates of treat- 
ment 1 are the treatments 10, 11, 12, 13, 14, 15, and those of 10 are 1, 4, 5, 8, 9, 
15, so that 15 is the only common second associate (since p32 = 1) of both 1 
and 10. Hence, any two treatments of the set (1, 10, 15) are second associates. 
Then, considering the pairs (1, 10) and (1, 15), it is easy to see that 11, 12, 13, 14 
are first associates of both 10 and 15. Now, (1, 11) = 2 and 10 and 15 are first 
associates of 11. Hence, from the value pj, (1, 11) = 4, we see that 11 has two 
first associates from the set (12, 13, 14). Let these be 12 and 13 so that 


(11, 12) = (11,13) = 1, ~— (11, 14) = 2. 


Now, (1, 14) = 2 and, as before, 14 has two first associates from the set (11, 12, 
13). These are obviously 12 and 13, since (14, 11) = 2. Hence, we have 


(12, 14) = (18, 14) = 1. 


Similarly considering the pair (1, 12) and noting that (12, 11) = (12, 14) = 
1, we get (12, 13) = 2 





TRIANGULAR ASSOCIATION SCHEME 43 


All the above information can be easily read by writing the second associates 
of 1 in the following scheme: 


a 15 

(11 

14 

112 

13 
The explanation of the scheme is as follows: Treatments 10, 11, --- , 15 are 
second associates of 1, where 11, 12, 13, 14 are first associates and 15 is a 
second associate of 10. We write 10 and 15 in the row in the second and third 
positions respectively. Treatments 11, 12, 13, 14 are also first associates of 15. 
Further any two treatments of the set (1, 10, 15) are second associates. Of tne 
six pairs from the set (11, 12, 13, 14), only those marked by straight lines on the 
left are second associates, while the remaining four pairs are first associates. The 
relations implied by S; are written completely as follows: 


(4.2) (1, 10) = (1, 11) = (1, 12) = (1, 13) = (1, 14) = (J, 15) = 2 
(10,15) = 2, (10, 11) = (10, 12) = (10, 13) = (10, 14) = 1 
(15, 11) = (15, 12) = (15, 13) = (15, 14) = 1 
(11,14) =2, (11,12) = (11,13) = 1 
(14, 12) = (14,13) = 1, (12,13) = 2. 


Now, among the seven treatments above the only second associates of 15 are 


1 and 10. Let the remaining four second associates of 15 be 2, 3, 6, 7. Then, as 
before 2, 3, 6, 7 are first associates of both 1 and 10. Without loss of generality 
assume that (2,7) = (3,6) = 2 and hence, (2,3) = (2,6) = (7,3) = (7, 6) = 1. 
Hence, we can represent the second associates of 15 in the following scheme: 


15 1 10 


| 


The new relations implied by 8, are 


(4.3) (15, 2) = (15, 3) = (15, 6) = (15, 7) = 2 
(1, 2) = (1, 3) = (1, 6) = (1,7) = 1 
(10, 2) = (10, 3) = (10, 6) = (10,7) = 1 
(2, 7) = 2, (2,3) = (2,6) = 1 
(7,3) = (7, 6) = 1, (3, 6) = 2. 
We now consider the relation of any treatment from the set (2, 3, 6, 7) with 
any treatment of the set (11, 12, 13, 14). 
Now, (1, 2) = 1 and 10, 15 are, respectively, first and second associates of 2. 





44 8S. S. SHRIKHANDE 


Hence, from the value pi2(2, 1) = 3 and px (2, 1) = 3, we see that 2 has exactly 
two first associates and exactly two second associates from the set (11, 12, 13, 
14). Suppose we have (2, 11) = (2, 14) = 1 and hence, (2, 12) = (2, 13) = 2. 
Then, since (12, 13) = 2 and the common second associates of both 12 and 13 
are 1 and 2, we get pz: = 2. Hence, a contradiction. We get a similar contradic- 
tion, if we assume that 11, 14 are second associates of 2. Hence, the only possible 
case is that 2 has just one first associate and just one second associate from each 
set (11, 14) and (12, 13). We can assume without loss of generality that 


(4.4) (2,11) = (2,12) =1, (2,14) = (2,13) =2. 


Now, consider the pair (15, 11) = 1. Here 1 and 10 are respectively second and 
first associates of 11. Hence, as before, of the remaining four second associates of 
15, i.e., 2, 3, 6, 7, exactly two are first associates and exactly two are second 
associates of 11. A similar argument shows that 11 has exactly one first associate 
and exactly one second associate from the sets (2, 7) and (3, 6). But, we already 
have (11, 2) = 1, and hence we must have 


(4.5) (11, 7) = 2. 

A similar argument considering the pair (1, 7) = 1 shows that 
(4.6) (7, 14) = 1. 

In the same manner we also get 

(4.7) (7, 13) = 1, (7, 12) = 2. 


We, thus, get the relationship of any treatment from the set (2, 7) with any 
treatment of the set (11, 12, 13, 14). A similar argument shows that 11 has just 
one first associate and just one second associate from the set (3, 6). Without loss 
of generality we can assume that 


(4.8) (11,6) =1, (11,3) =2, 

(14,3)=1, (14,6) = 2. 
Now, the relationship of 3, 6 with 12, 13 remains to be determined. Obviously, 
we have the two following possibilities: Either 


(4.9) (A): (6, 12) = (3,13) = 1, (6,13) = (3, 12) = 2, 
or 
(4.10) (B): (6, 12) = (3,13) = 2, (6,13) = (3,12) = 1. 


We now proceed to show that case (B) is impossible. 

Amongst the eleven treatments occurring so far, the only second associates of 
10 are 1 and 15. Hence, the remaining second associates of 10 are 4, 5, 8, 9. Of 
the six possible pairs just two of them are second associates. Assume without loss 
of generality that 


(4.11) (4,9) = (5,8) = 2 
(4, 5) 


Il 
iil 
ios 
oo 
— 
- 
or 
— 
ll 
= 
oo 
— 
Il 





TRIANGULAR ASSOCIATION SCHEME 


TABLE 1 


First Associates Second Associates 





] ] 
Column 2 Column 3)Coluimn 4 Column 5 |Column 6 Column 7 


12 13 14 15) 
14 15 
13 15 

14 
11 
14 
12 
14 
11 
5 


5 6 7 8 Q 
10 ll 12 
10 12 14 
ll 15 
14 15 
11 
10 13° 
ll 15 
1415 
i tis 
8 10 12 
11 14 15 
10 11 14 15 
7 9 10 12 
ss ¢n RB 


NNN RK Re Re ee ee 


a> 
——st to oe oh WOO aN SO | 
| 


oan worwaoarkrkwn' Ord w~ & 


- Ww 
— 
oanrenwnwn © 


represent this by 


Also, we can assume with loss of generality by considering the pair (10, 11) = 
1, that 


(4.12) (11, 4) = (11, 8) = (14, 9) = (14, 5) = 1 

(11,9) = (11, 5) = (14, 4) = (14, 8) = 2. 
We note that the relations (4.11) and (4.12) do not depend in any manner on the 
relation (B). We summarize the information given by (4.2), --- , (4.8), (4.10), 
(4.11) and (4.12) in Table 1 in columns 1, 2 and 5. 

We now consider the possible relationship of the treatments 12, 13 with the 

treatments 4, 5, 8, 9. We have the four possible cases, 

(12, 4) = (12, 8) = (13, 9) = (138, 5) = 

(12, 9) = (12, 5) = (13, 4) = (13, 8) = 

(12, 9) 2,5) = (13, 4) = (13, 8) = 

(12, 4) = (12, 8) = (13, 9) = (18, 5) 

(12, 8) = (12, 9) = (138, 4) (13, 5) = 

(12, 4) 2,5) = (13, 9) = (13, 8) = 

(12, 8) = (12, 9) = (13, 4) = (13, 5) = 2 

12, 4) 2,5) = (13, 9) = (13, 8) = 





46 8. S. SHRIKHANDE 
! 

Of these case (i) is impossible, since otherwise from Table 1 and columns 1 and 
2 we see that (11, 12) = 1 and 11 and 12 would have five common first associates 
contradicting pj: = 4. Similarly, case (ii) gives (11, 12) = 1 and pj,(11, 12) = 3. 
We are thus left with only case (iii) and case (iv). 

We now consider case (iii). The information given by this is entered in columns 
3 and 6. We now consider the possible relationships of 2 and 7 with 4, 5, 8, 9. 
We have the following cases to be considered: 


(a) (2,9) = (2,5) = (7, 4) = (7,8) = 1 
(2, 4) = (2,8) = (7, 5) = (7, 9) = 2, 

2 

1 


(8) (2,9) = (2,5) = (7, 4) = (7, 8) = 
(2,4) = (2, 8) = (7, 5) = (7,9) = 1, 
(2,4) = (2,5) = (7,9) = (7, 8) = 2, 

(6) (2,9) = (2,8) = (7,4) = (7,5) = 2 
(2,4) = (2,5) = (7,9) = (7,8) = 1 


Referring to Table 1 and columns 1, 2, 3, 5 and 6 we see that (14, 2) = 2, and 
cases (az) and (8) give pi: (14, 2) = 2 and 0 respectively, giving a contradiction 
since p32 = 1. Similarly, (13, 2) = 2 and (y) and (8) give pe (13, 2) = 0 and 2 
respectively, again a contradiction. Hence, we see that case (iii) is impossible. 

We now suppress the information in columns 3 and 6 and put down the in- 
formation given by case (iv) in columns 4 and 7. With case (iv) we again consider 
the cases (a), (8), (vy) and (6). We now look up columns 1, 2, 4, 5, 7 of Table 1. 
Again, (14, 2) = 2 and (a) and (8) give px: (14, 2) = 2 and 0 respectively. Simi- 
larly, (13, 2) = 2 and (y) and (6) give p32 (13, 2) = 2 and 0 respectively. Hence, 
a contradiction again. Thus, case (iv) is also impossible. It is now clear that 
case (B) is impossible, and we are left with case (A) alone. The relations (4.2), 
..., (4.9) now give the following two sets of first associates of treatment 10: 


(11, 12, 2, 6) 
and 


(14, 13, 7, 3), 


where any two treatments from each of the two sets are first associates. 

A similar result can be proved for any treatment x by considering its two sec- 
ond associates y and z where (y, z) = 2 and taking the four remaining second 
associates of y and z which will be the eight first associates of x. This completes 
the proof of Lemma 2. 

The application of the theorem now gives the corollary. 

CoROLLARY. A design with parameters (4.1) has triangular association scheme. 


5. Uniqueness of Triangular Association Scheme for n = 9. A lemma similar 
to Lemmas 1 and 2 can be proved for this case which implies that the association 








TRIANGULAR ASSOCIATION SCHEME 47 


scheme is triangular ifn 2 9. The proof is emitted, as another proof has already 
been given by Connor [3]. 


REFERENCES 


[1] R. C. Bose ann K. R. Narr, ‘Partially balanced incomplete block designs’, Sankhya, 
Vol. 4 (1939), pp. 337-372. 

[2] R. C. Bose anp T. Sximamorto, “Classification and analysis of partially balanced de- 
signs with two associate classes’’, J. Amer. Stat. Assn., Vol. 47 (1952), pp. 151- 
190. t 

[3] W. S. Connor, ‘‘The uniqueness of the triangular association scheme’’, Ann. Math. 

Stat., Vol. 29 (1958), pp. 262-266. 





ON A GENERALISATION OF THE KRONECKER PRODUCT DESIGNS' 


By B. V. SHan 
University of Bombay 


1. Summary and introduction. The use of incomplete block designs for estimat- 
ing and judging the significance of the difference of treatment effects is now 
standard. Fisher and Yates [2] have provided a complete table of balanced in- 
complete block designs (BIB) for the value of r, k < 10. In this paper a method 
of constructing a special class of partially balanced incomplete block designs 
(PBIB) from the known BIB designs is given. Vartak [6] and Sillito [4] have 
used the Kronecker product of matrices to construct statistical designs. Their 
method and the method given in this paper differ only in the fact that in their 
method two distinct elements of a matrix are replaced by two distinct matrices, 
while in the method considered in this paper, two or more distinct elements of a 
matrix are replaced by different matrices. All the PBIB designs considered in this 
paper are with three associate classes, and the rectangular association scheme 
for v = pq treatments are written in p rows and q columns so that treatments 
in the same row are the first associates, treatments in the same column are the 
second associates and the rest are the third associates. 


2. Notation and some definitions. Throughout this paper, parameters of a 
BIB design will be denoted by v*, b*, r*, k*, \*. The parameters of a PBIB will 
be denoted by »v, b, r, k, Ar, *** , Am, M1, °°", Mm, Py;, as defined by Bose and 
Nair [1] and later generalised by Nair and Rao [3]. 

The following three designs, with the given incidence matrices and parameters 
will be included in the BIB’s in this paper. 

(a) A null design with the incidence matrix, 0(v*, b*) [the v* & b* matrix with 
all the elements equal to zero]. Parameters: v*, b*, r* = k* = X* = 0. 

(b) A randomised block design with the incidence matrix E(v*, b*) [the v* x 
b* matrix with all the elements equal to unity]. Parameters: v*, b*, r* = A* = 
b*, k* = v*. 

(c) A design with the incidence matrix 7(v*) [the v* X v* identity matrix]. 
Parameters: v* = b*, r* = k* = 1, * = 0. 

DEFINITION 2.1. Two BIB designs with incidence matrices N; and N» and 
parameters ob, tha, Wt ond vs, bs, 7s, ks, %, respectively will be 
called associable designs, if and only if by = b: , vt = ve and the design Ny» = 
[N,/N.] formed by joining the corresponding blocks of the two designs has the 
following properties: 

(i) The zth treatment of N; occurs with the ith treatment of N» exactly yu» 
times for all 7 = 1, 2,---, ve ; 


* 


Received March 3, 1958; revised August 1, 1958. 


1 This work was supported by a Research Training Scholarship of the Government of 
India. 


48 





GENERALISED KRONECKER PRODUCT DESIGNS 49 


(ii) The tth treatment of N; occurs with the jth treatment of N2 exactly m2 
times for all i # j;7,7 = 1,2, --- a. 

The following results are direct consequences of the above definition: 

A BIB design (N,) with parameters v*, b*, r*, k*, \* is associable with 

(i) a null design {N, = O(v*, b*)} and we = 0 = ny 
(ii) a randomised block design {N; = E(v*, b*)} and wis; = r* = ms, 

(iii) its complementary design {N, = E(v*, b*) — Ny} and py = 0 ny = r* — 
A*, and 

(iv) itself (N,) and wy = r* gy = X*. 

DEFINITION 2.2. Let A = ai;,i = 1,2, ---,m,j = 1,2, +--+ ,n, be a matrix 
whose elements take the S values 1, 2, --- , S. We shall call matrix A a balanced 
matrix in S integers if and only if the following conditions are satisfied: 

(i) The number of times the integer p occurs in a column is the same for all 
the columns and is equal to a,, say. 

(ii) The number of times the integer p occurs in a row is the same for all the 
rows and is equal to 6, , say. 


(iii) The number of times the combination (”) or (:) occi'rs in a pair of rows 


\ 
is the same for all the pairs of rows and is equal toyp,, say. (®) and (:) will be 


considered as identical combinations). 

From Definition 2.2 we can easily prove the following lemmas: 

LEMMA 2.1. If, in a balanced matrix A in S integers, some of the integers, say h, 
are replaced by 1 and the remaining (S — h) integers by 0, then the matrix A will 
be converted into an incidence matrix of a BIB design. 

LemMA 2.2. If A is a matrix whose elements take the S values 1, 2, --- , S and if 
any one of the integers is replaced by 1 and the remaining S — 1 by 0, or any two 
of the integers are replaced by 1 and remaining S — 2 are replaced by 0 and 7f, in 
all these S + (5) ways, it is found that the matrix A is converted into an incidence 
matrix of a BIB design, then the matrix A must be a balanced matrix in S integers. 

CoROLLARY TO LEMMAS 2.1 AND 2.2. The necessary and sufficient condition for a 
matrix A to be balanced is that there exist S BIB designs N,, Nz, --- , Ns such 
that > N; = E(m, n), (N; + N;) is also an incidence matrix of a BIB design for 
alli # jand A = 2 is iN; . 

Lemma 2.3. (Generalisation of Lemma 2.1). If there is a balanced matrix A in 
Se integers, and these Sz integers are divided in S; (S; < Se) groups such that each 
group contains at least one integer and all the elements of a group are replaced by an 
identical integer, then the matrix A will become a balanced matrix in S, integers. 

It is obvious that the incidence matrix of a BIB is a balanced matrix in two 
integers. Further, from the corollary to Lemmas 2.1 and 2.2 for the existence of a 
balanced matrix in S integers with parameters as given in Definition 2.2, it is 
necessary that several BIB’s with v* = m, b* = n and for all the values of k* = 
a;, ora; + a;,j,i = 1, 2,---, S exist separately. The existing BIB’s satis- 








50 B. V. SHAH 


fying the above conditions and v, b S 15 for S 2 3 are as follows: 


v* = m bt=n a’s 
(1) 3 3, 6, 9, 12 or 15. a = ae = ag = 1 
(2) 4 12 any values such that }> a; = 4 
(3) 5 10 any values such that >> a; = 5 
(4) 6 15 a, = a = a3 = 2 
(5) 7 7"; 14 a = a = 3,a; = 1 
(6) 9 12 a, = ae = a3 = 3 
(7) 11 11 a, = a, = 5,03; = 1 
(8) 15 15 @ = a = 7,03 = 1 


The balanced matrices with the above parameters are constructible and are 
given in the appendix. Any other balanced matrix with m,n S 15 and S 2 3 
does not exist because the corresponding BIB’s do not exist. 


3. Construction of PBIB’s. Using associable BIB’s and a balanced matrix we 
can construct PBIB designs with three associate classes as given by the follow- 
ing theorem. 

THEOREM 3.1. Let there be S BIB designs with incidence matrices N, , N2,--- , 
N 5 . Let the parameters of the ith design be v*, b*, r, , ke, andd; . Let the pth design 
be associable with the qth design with the parameters ppg and np, for all p, gq = 1, 


2, --- , S. Now let there be an m X n balanced matrix A in S integers 1,2,---,S 
with parameters ap, Bp, Ypq a8 defined in 2.2. Now if we replace the integer p in 
matrix A by the matrix N, (p = 1, 2,---, S), then matrix A will be converted 
into an incidence matrix of a PBIB with the following parameters: 
v=m-v*, 
b = n-b*, 
= + 
.? ~ Bp Yn; 
p=1 
8 
k= 20 apks, 
p=l 
m = v* — 1, nm =m— il, Ns = 1% No, 


8 
* 
M = du Bp Xp, Ae = > ; Y ph pay ‘3s = } ° Ypa "pa; 


P2=q= p2q= 
m—1 0 0 0 0 ny 
Pi; = 0 Ne ; P?; = Nl 1 0 ’ 
(m — 1)n (ne — 1)m 
0 1 n-— 1 
Pi; = 0 mm — 1 


(ny —_ 1) (ne — 1) 





GENERALISED KRONECKER PRODUCT DESIGNS 51 


Proor: Expressions for v, b, r, k are quite obvious and need no proof. Others 
can be proved as follows: 

From the method of construction, it is obvious that we have m groups of »* 
treatments each corresponding to a column of matrix A. The treatments belong- 
ing to the same group will be called the first associates. The ith treatment of one 
group and the ith treatment of another group will be called the second associates. 
The 7th treatment of one group and the jth treatment (i ~ 7) of another group 
will be called the third associates. Expressions for n,’s and P%;’s immediately 
follow from this association scheme. 

Now in any row of the matrix A the integer p occurs 8, times, therefore N, 
also occurs 6, times. In N, any pair of treatments occurs together \* times. 
Hence any pair of treatments belonging to the same group occurs exactly >-4_, 
8,A* times. Therefore 


8 
= 2 Bp dp. 


p= 


Similarly, for any pair of rows of the matrix A the combination (?) or (?) 
occurs exactly y,, times. Hence, the ith treatment of one group and the ith 


treatment of another group together >~ ypoup_ times, and the ith treatment of 
one group and the jth treatment (i # 7) of another group occur together exactly 
bm Yq Npq times. Hence 


This completes the proof of Theorem 3.1 
As an illustration let us take the matrix N, as the incidence matrix of a BIB 
with the parameters v*, b*, r*, k*, \*, N» as the incidence matrix of its comple- 
mentary design (Nz = E(v*, b*) — Ni), Ns = O(v*, b*) and 
; 3-s 
Ami2 3 1 
> 


He = 0, Mis = 0, Hes = 0, 
mn = r? = )%, ms = 0, 723 0. 


Hence on substituting NV, in place of p in matrix A, (p = 1, 2, 3), we obtain the 
incidence matrix of a PBIB with the parameters 


y = By*, b = 30*, r = O*, k = y* 
A == b* a 2r* a 2r*, 2 => 0, A3 = r* —_ 2°. etc. 


, 








52 B. V. SHAH 


4. Some particular cases. Let a balanced matrix A in the two integers 0, 1, 
be the incidence matrix of a BIB with the parameters othe ots bs de , and 
let NV; be the incidence matrix of a BIB with the parameters os bs, % kz , . 
then by taking No as O(v2, bz), E(u: , b:) and E(vz ,b:) — N, respectively we 
have the following three cases: 


i) No 


O(v2 , b2), we obtain a PBIB with the parameters 


*, * * * *, * 
: b = bibs , r= T1fe, k = kik 


* 


A = r1A2, he = red, a eke 


ii) No = E(vz , b:), we obtain a PBIB with the parameters 


v= viv, b = bib’, r= rere + (bt — ry )be , 
k = kik: + (vr — kf )ve , 
MA = ridz + (br — rz)b? , 


he = (br — 2rz + Arb: + Wry — AT)r2 + AIT? , 
hs = (br — 2ry + AT)bF + Arf — AY)rz + ATL. 
iii) No = Elve ‘ b:) — N,. we obtain a PBIB with the parameters 
y= vive , b= bibs : 
r=rirs + (b: — ri)(b: — re), 
k = kk: + (vt — kt) (vz — ke), 
M = riz + (bt — rt) (bt — 2rF + d2), 
he = r2dr + (b2 — r2) (bt — 2rt + df), 
As = (br — 2r? + AB) (bY — Qf + AT) + A(rz — Az) (rE — AT) + ATAL . 
It will be noticed that case (i) above is exactly identical with the Kronecker 
Product of BIB’s as given by M. N. Vartak [6]; the same method also gives some 
of the designs by D. A. Sprott [5]. 
The method employed by G. P. Sillito [4] to construct BIB is identical with 
the case (iii) above with the conditions A = N,; and by = A(rt — d3). Also, 
of the two PBIB’s with v = b = pq given by R. C. Bose [1], one is a partic- 


ular case of (iii) above with A = J(p) and Ni = E(q,q) — I(q) and the other 
is identical with (ii) above on taking A = E(p, p) — I(p) and Ni = I[(q). 





GENERALISED KRONECKER PRODUCT DESIGNS 53 


Furthermore it should be noted that if there are two designs of this type with 
the same number of treatments, the same association scheme for three associate 
classes, and the same block size, we can construct a new design of this type, just 
by adjoining the two designs. 

Lastly if in a design 4; = A; or Ay = Az, the design is reduced to a group 
divisible design, and if \; = A, = A;, the design is reduced to a BIB. 

5. Acknowledgement. The author is grateful to Prof. M. C. Chakrabarti for 
his help and guidance. The author is indebted to one of the referees for his valu- 
able suggestions; in particular the balanced matrices (4), (6), (7) and (8) of the 
appendix are his. 


APPENDIX 


The following is the complete list of the balanced matrices with m, n S 15 
and S 2 3. 

Balanced matrices with m = 3, n = 6, 9 or 12 and m = 7, n = 14 can be 
obtained by replicating the solutions (1) and (5) Balanced matrices with m = 4, 
S = 3andm = 5, S = 3, 4 can be obtained from the solutions (2) and (3) by the 
help of Lemma 2.4. 


(1) = 3. (2) 


won - = 


Qu 
oo 


2 ww 


m 
1 
2 
3 
4 
5 


m ot dm Wd 
bo = oe 
wee oe 
PBPwWN eo 
word » & 

=m wo wh — w | 

oo 


mrwowew 
n= bo 


wwe 


a 


m=n= 7, 
1322121 
First row; 
matrix is cyclic. 


= 
R 
lI 

wo 


no = = 


~) 
mem rw toe tO 


bo 

ee Wowwnnwny 
WN RR www 
WN WN WH 
i oe oe) 
we wewnwhd- 
WN Newer wh 
RK wnwnre doe wh 
mMrwwwe nnd we 
ee Oe ee 


wow wet 
m— © N= & bO 
nN w 





54 B. V. SHAH 





(7) m=n= ll, S = 3. (8) m=n= 15, S = 3. 
S.1:2-2:4..123223221.2 3 1.1142311.12322223222 
First row; 2322121172122111 
matrix is cyclic 2 

2 
2 
2 
2 | 
gi cit valine | 
SPiZsgssziise tires 
1 
1| | 
1| | 
1} 
1 | 
1 


Each 7 X 7 block is cyclic. 


REFERENCES 

{ij R. C. Bose anv K. R. Narr, ‘‘Partially balanced designs,’’ Sankhya Vol. 4 (1939), pp. 
337-372. 

{2} R. A. Fisher anv F. Yates, Statistical Tables, Oliver and Boyd, 1938. 

(3] K. R. Narr ano C. R. Rao, “‘A note on partially balanced incomplete block designs,”’ 
Science and Culture, Vol. 7(1941-42), pp. 568-569. 

(4] G. P. Stturro, “An extension property of a class of balanced incomplete block designs,” 
Biometrika, Vol. 44 (1957), pp. 278-279. 

[5] D. A. Sprott, ‘‘Some series of partially balanced incomplete block designs,’’ Canadian 
Jour. of Math., Vol. 7 (1955), pp. 369-381. 

(6) M. N. Varta, “On an application of Kronecker product of matrices to statistical 
designs,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 420-438. 





APPLICATION OF A MEASURE OF INFORMATION TO THE DESIGN 
AND COMPARISON OF REGRESSION EXPERIMENTS 


By M. Stone 
British Medical Research Council Applied Psychology Unit 


1. Introduction and Summary. A normal regression experiment can be repre- 
sented by 


k 
(1.1) Y;= 2D. Xi + 01 


where {n/t = 1, --- , n} is a set of normally distributed random variables with 
zero means and non-singular dispersion matrix C, @ = (6, , --- , @) is the param- 
eter-vector of interest and X = (X;,;) is a known n X k matrix which will be 
called the allocation matrix. The rows of X will be called the allocation vectors. 
We denote the experiment by &(X, C). We assume that C is known; generally it 
will be a function of X, C(X). The particular realisation of Y will be denoted y. 
The matrix F = X’C™'X is the Fisher-information-matrix of &(X, C). 

When F is non-singular, one answer to the question “What information does 
y give about 6?” is to quote F’, the dispersion matrix of the maximum-likeli- 
hood-estimates of @. A strong argument in favour of this is that F~ is independ- 
ent of both @ and y. The fact that it is independent of @ means that the answer 
is not “local”; the fact that it is independent of y leads to simplicity. This 
approach is taken by Box and Hunter [1] in their work on rotatable designs. 
However, we must accept the fact that many experimenters wish to have a one- 
dimensional answer to the question i.e. we must associate with &(X, C) a single 
number which we call the “information”. For instance Elfving [5] has developed 
the use of trace F-*. In this paper we adopt the measure of information intro- 
duced by Lindley [7]. In Section 2 we generalise Lindley’s treatment of the 
regression situation to include the singular case, explain the uses of the measure 
and compare it with that of Elfving. Section 3 deals with the analogue of Elfving’s 
main theorem. Theorems 4.1 and 4.2 of Section 4 provide links with the tradi- 
tional variance approach. In Section 5 we derive the asymptotic form of the 
measure as the n of (1.1) increases and show that this form can be derived also 
from Neyman-Pearsonian theory. In Section 6 the influence of nuisance param- 
eters is discussed and an analogue of a theorem of Chernoff [2] is established. 


2. The information measure is defined in the Bayesian framework. Generally, if 
before experimentation we express our knowledge of @ by the prior distribution 
p(@) and, after the experiment defined by the set of probability density functions 
p(y/@), we obtain a posterior distribution p(@/y), then the gain of information is 


Received November 4, 1957; revised March 15, 1958. 
55 





56 M. STONE 


defined as the functional 
(2.1) AI = Ip(@/y) — Ip(@) 


where Ip(u) = f p(u) log p(u) du. 


Lemma 2.1. If, before &(X, C), @ is normally distributed with mean yu and non- 
singular dispersion matrix A then 


(2.2) AI = }log|I + AF}. 


PROOF. 


Ip(6) 


[vr log ((2x)~* | A i exp {—4(0 — u)’A~’(6 — u)}] dé 


ll 


—} log [(2n)*| 4 |] —3 [ p)(@ — uA — u) ao 


k k 
—} log [(2x)* | A |] —3 > & A%A;; 


t=] jul 


—} log [(2r)* | A |] —4k. 


Il 


Also 
p(6/y) = ply/@)p(6)/ply) 

« exp [—3(y — X6)’C"'(y — X0)—3(0 — yu)’A'(0 — w)). 
Therefore p(@/y) is normal with dispersion matrix 

(X‘C'X + Av)" =(F +A4°)". 


Hence 


Ip(6/y) = —} log ((2x)*/| F + A*|] —3k 


and 


AI = }log(|F + A™||A|) = 4log|I+ AF]. 


Among the class of regression experiments for which it is reasonable to take a 
normal prior distribution for 6 with dispersion matrix A, the expression for A/ 
just derived proves useful in three ways: 

Use 1. If we decide to experiment until the gain of information reaches a cer- 
tain level then the fact that AJ is independent of y allows us to state in advance 
whether a particular experiment will give us the required gain. Among experi- 
ments which do, we may choose the one which is most economical in some sense. 
This is the case of fixed-sample-size-experimentation. 

Use 2. Two experiments can be compared in the following sense: 

“Any result of &(X, , C1) will give AJ, — AJ, more information than any result 
of &(X, , C2)”. We may note that we are not obliged to use the average gain of 
information (as defined by Lindley [7]) to compare &(X,, C,) and &(X,, C2) 
although the result of doing so would be the same. 





INFORMATION AND REGRESSION EXPERIMENTS 57 


Use 3. If we have a choice of performing any experiment from a given class, 
we may choose the &(X, C) which maximises AJ. AJ possesses two advantages 
over the measure trace F’: 

(i) If ¢ = Mé@ is a non-singular linear transformation then A/ is the same 
whether we consider information about @ or about ¢. For 


4 log|I + AF | = 4 log[|M||I+ AF||M"|] 
= 3 log| J + (MAM’)(XM")'C°(XM™) |. 


But MAM’ is the prior dispersion matrix for ¢ and, under the transformation, 
&(X, C) — &(XM™, C) so that (2.3) is the AI for ¢. 
(ii) AJ can be used even when not all the 6; are estimable i.e. when F is singu- 
lar, whereas in this case trace F~’ becomes infinite. For |J + AF | = 
A||A™"+ F|, A” is positive-definite and, although F is singular, it remains 
positive-semi-definite. Hence | A~’ + F | and therefore | J + AF | are non-zero. 


3. In connection with Use 3 we proceed to show that a theorem proved by 
Elfving using trace F~ still holds if we adopt AJ. The theorem is concerned with 
the following problem: ‘‘Given g possible allocation vectors x(1), --- , 2(g) which 
are linearly independent, we are to make n independent observations where n 
is large and each observation can be made at any of the allocation vectors. How 
are the observations to be allocated to maximise AJ?’’. To answer this we need 
a lemma. 

Lemma 3.1. If F(m, +--+, ,) is the Fisher-information-matrix of the experi- 
ment consisting of n; observations at x(t) (i = 1, --- , g) with the errors (n) uncor- 
related and if we replace n; in this matrix by np; , pi 2 90, 7?) pi = 1, to obtain the 
matrix F*{[np, , --- , npg) then in general | I + AF* | is maximum when no more 
than $k(k + 1) of the p,’s are non-zero. 

Proor. We have assumed that the non-diagonal elements of C(X) are zero. 
There is no further loss of generality in assuming that C(X) = J for we can always 
write \; x(7) for z(7) so that this is so. We find that the (r, s)th element of 
F(m,, +++, mg) is >~9-4 ni 2,(2)2,(i). Then 


F?, = np; x,(%)a2,(t). 


t=1 


(2.3) 


There are two possibilities: 

(a) If there exists an 7 such that | J + AF* | is maximum when p; = 1 then 
since 1 S 4k(k + 1) the lemma holds. 

(b) If | J + AF*| is maximum at p = p(m) when more than one p,(m) is 
non-zero, we proceed as follows. 


0 . ' 0 “ . 
ap, 4 tae alge da ' + F*| 


| nai(i) nay(t)ao(7) 


|A| >> |A" + Fn A” + Fo 





58 M. STONE 


for i = 1,---, g after row-by-row differentiation. Expanding each of these 
determinants about the row which has been differentiated we have 


ZIT + AFt| = nAlZOQI) — Gao) 


where Q[p] = adj (A~* + F*). Now >> p; = 1 so that, applying the method of 
undetermined multipliers to variations of the p,’s for 7’s for which p,(m) # 0, 
we see that there exists a \ such that 


(ale + arei os 


p(m) 


for i such that p;(m) # 0. Hence for such 7 
(3.1) z'(i) Q[p(m)|z(t) = N’_ where 2X’ = A/n| Al. 


Now 2’Q[p(m)|z = 2’ is the equation of a central quadric. In general not more 
than $k(k + 1) of the given allocation vectors can lie on a central quadric and 
hence the lemma is established. To make the lemma fully rigorous, we would 
need to consider the possibility that more than $k(k + 1) of the vectors do lie on 
a central quadric. We omit this consideration since it is rather tedious. 

The lemma relates directly to the problem stated. From (2.2) 


AI = 3 log |J + AF(m, --- , n)| = 3 log \J + AF*[np,, --- , np,]| 


evaluated at p; = n;/n. So that if we do the experiment consisting of [np,(m)| 
observations at z(i) (¢ = 1, --- , g), which involves using at most $k(k + 1) of 
x(1), --- , x(g), for n large we will be making n — f observations in all where 
f = g. The allocation indicated will provide (asymptotically) the maximum 
Al. Thus: 

THEOREM 3.1. 7'0 achieve maximum AI for the above problem it is not necessary to 
use more than 4k(k + 1) of the given allocation vectors. 

For n not large the theorem is not necessarily true. Although it does not 
specify p(m), it is nevertheless useful in providing a rule for rejecting some ex- 
periments for using too many allocation vectors. Generally the calculation of 
p(m) is not feasible. However when k = 2 and the elements of (AF*[np(m)])™ 
are small, we may proceed to obtain p(m) approximately. By Theorem 3.1 only 
three allocation vectors need be used. Consider them as the only vectors given: 
x(1), x(2), x(3). Approximately 


Qlp] = ( Lenprij)  —Lnp, <7 
P -> np; X2(j)a1(j) > np; x3(j) 
and hence 


a’ (¢)Q[pla(i)/n - a4(1) (> Pi 23(j)] ee 22; (1%) x(t) (> Pi a(j)x2(j)] 


3 
+ x2(7) (> Pi xi(j)] = dX Pi E.; (i = l, 2, 3) 





INFORMATION AND REGRESSION EXPERIMENTS 59 


where E;; = [x:(i)aa(j) — 2x2(i)a:(j)|’. We note that E;; > 0 when i ¥ j since 
the vectors x(z) are supposed independent. Then, if p,(m) # 0 (i = 1, 2, 3), 
equation (3.1) gives 


0 Ey E,ss\ /p(m) yg 

En 0 Ess }} palm) | =| 2d” 

Ex Es 0O/ \ps(m) ” 
where \” = )’/n = /n’|A|. Using >> pm) = 1 we get 


3 
(3.2) pi(m) = Ex(Ei + Bu — Ej) p> Ej (Ey + Ex — Ej) 


for i = 1, 2, 3; (j, k) = (1, 2, 3) — (@). Also 
3 
(3.3) N” = 2E x Fis Ee dX En(Ei + Ea — Ej). 


Hence for p,(m) > 0 (¢ = 1, 2, 3) either 

(3.4) Ei; + Ea > Ex (¢ = 1, 2, 3) 
or 

(3.5) Ei; + Eu < Ex (¢ = 1, 2, 3). 


The possibility of (3.5) can be readily excluded. We see that (3.3) and (3.5) imply 
” < 0 but since Q is positive-definite 0 < 2x(7)’Qz(i) = ’ = nd” or dX” > 0, 
which is a contradiction. Therefore only (3.4) is consistent with p,(m) > 0 
(¢ = 1, 2, 3). Equation (3.4) is not always satisfied by three given vectors. For 
example if x(2) lies between (1) and 2x(3) (in their two-dimensional representa- 
tion) and 


\x(2)| < min (|x(1)|, |2(3)|), Baz = jaa) a x(j)f = 4 Ajj 


where A;; = “area between x(7) and x(j)”. Clearly Ayz; > Ai: + As; ; therefore 
Ais > Ais + Ais ; therefore Ex > Ew + Es: ‘ 

However if (3.4) is satisfied the p(m) given by (3.2) is that which approxi- 
mately maximises AJ. Also since Q is positive-definite z’Qz = 2’ is an ellipse, so 
that we need consider only triples of allocation vectors which lie on central 
ellipses. 

We now evaluate the aymptotic maximum of AJ: 

(a) When (3.4) holds 


max AI = } log |J + AF*[np(m)]| = 3} log | A | + 3 log |F*[np(m)]|. 


Now |F*{np]| is a homogeneous polynomial of degree two in p; (¢ = 1, 2, 3). 
Therefore 


| F*[np] | = 40 pi ir | F*[np] |. 





60 M. STONE 


But, for the p(m) of (3.2), 


o r 
2? | pe ie 
(2 | [np] ) | A | 
so that 


| F*[np(m)] | = $/|A| = WExFysE:/>, En (Ey + Ex — Ej). 
Hence 
(3.6) max Al = logn + }log| A| + 4 log [2s Fis Ex/ D> Ej(Eij + Ex — Ep)). 


(b) When (3.4) is not satisfied, one of the vectors must have its associated p; 
zero. Suppose p;(m) = 0. Then the equations (3.1) lead to po(m)Ey. = ’/n and 
pi(m)Ey = d’/n so that p,(m) = po(m) = $ and X’/n = 4 Ey» or AX = 
4n’| A| Ey. Hence 


max AJ > 3 log |A| + 4 log |F*| = 4 log |A| + 4 log (4 X/|A)) 
= log n + 3 log |A| + 4 log (E,2/4). 


(3.7) 


In conclusion for the case k = 2 we state the experimental rule as follows: 
“Given g allocation vectors select those triples which obey (3.4) and calculate 


Ex EyE/>, En(Ei + Ex — En). 


Also select those pairs which are not members of triples obeying (3.4) and cal- 
culate E,,/4. Make the observations at the pair or triple which gives the greatest 
number, with n/2 at each vector for a pair and np;(m) at x(z) (¢ = 1, 2, 3) fora 
triple where p(m) is given by (3.2).” 

Exampte 1. k = 2; z(1) = (1, 1); z(2) = (©, 1); 2(3) = (1, 0). Here E,, = 
E\; = Ex ; therefore (3.4) is satisfied and, by (3.2), pi(m) = 1/3 (i = 1, 2, 3). 

EXAMPLE 2. k general. Suppose the given allocation vectors lie on the line 
a = (1, a,---,2"'). This is the case of polynomial regression. x’Qr = }’ is a 
polynomial of degree 2k — 2 in x. Therefore in general at most 2k — 2 of the 
vectors lie on a central quadric and therefore need be used. 

In his discussion Elfving considers in detail the case k = 2.-His solution, i.e. 
the “best allocation”’, is rather complicated when three vectors are used. For just 
two vectors he obtains 


pi(m) = |x(2)| / (ja(1)| + |2(2))) 
and 

pom) = \x(1)| / (Jx(1)| + |2(2)}) 
which clearly conflicts with p,(m) = po(m) = 4 using AJ. The reason for the 
difference becomes clear in the case (1) = (c; , 0), x(2) = (0, ce) for which 


npr ci et: 
F*|np) = ( es .) 
ae No C2 





INFORMATION AND REGRESSION EXPERIMENTS 61 


The different answers arise because, effectively, we minimise the product of the 
variances of the maximum-likelihood-estimates of 6; and 6, while Elfving mini- 
mises their sum. 


4. In this section we consider pairs of experiments &(X, , Ci), &(X2, C2) and 
prove some theorems relating to the cases when &; is always to be preferred to 
& . Write AJ,(A) = } log |J + AF;| (i = 1, 2). 

THEOREM 4.1. A necessary and sufficient condition that AI,(A) 2 Al,(A) for 
all positive-definite A is that F, — Fs be positive-semi-definite. 

Proor. Sufficiency. We use the fact that if L and Mare positive-definite and 
L — M is positive-semi-definite then |L| 2 |M|. (Proved by diagonalising 
Land M.) Put L = A“ + F, and M = A™ + F, ; then if F, — Fs is positive- 
semi-definite |A~* + F,| = |A~' + F2| which gives AJ,(A) = Al2(A). 

Necessity. AJ,(A) = AJ.(A) for all positive-definite A implies that 
A + F,| = |A™ + F,) for all positive-definite A~*. Now F, and F; are positive- 
semi-definite so that there exists a non-singular P such that P’F,P and P’F,P are 
diagonal. 


d,(t) 


(4.1) P’F,P = (i = 1, 2) 
d,(i) 


Therefore 


d,(2) 


a,(1) | 
|P‘A“P + 1a }2|P’A“P + 
} d,(1) 


Py d,(2)/ | 
where A’, and hence P’A™'P, is arbitrary positive-definite. Taking P’A~'P 


diagonal with all diagonal elements large except the r’th we deduce d,(1) = 
d,(2). Therefore from equations (4.1) 


d,(1) — d,(2) 
P’'(F, — F.)P = :o 
dy(1) — d,(2) 


is positive-semi-definite. Therefore F,; — F: is positive-semi-definite. 

We now give simpler proofs of theorems due to Ehrenfeld [4]. 

THEOREM 4.2. If v,(t) is the variance of the maximum-likelihood estimate of 
t'@ from &; and if F, — Fs is positive-semi-definite then v;(t) S ve(t) for all t for 
which t'@ is estimable from both & and & . 

Proor. We know that », = mF im and v. = nF on where m and m are any 
solutions of 


(4.2) Fim = t, 
We have 7i(Fi — F:)m = 0 or 


(4.3) Oy mF om 20 





62 M. STONE 


while 
(m — m)'Fo(m — m) = mFom — 22Fom + mF 2m 
= 0. 
From (4.2) :F2m = Fim = 0; therefore 
(4.4) mFom — 2u, + v2 = 0. 


Adding (4.3) and (4.4) we obtain »; S v. 

THEOREM 4.3. Given g allocation vectors x(1), --- , x(g) and their convex hull 
© = {> x2) / OM = 1, = O}, if F(a, «++ , an) isthe Fisher-information- 
matrix of the experiment consisting of n independent observations at x, +--+ , Xn 
(with unit error variance) where x; € © then we may take less than n + g + 1 ob- 
servations at the vertices of © so that their Fisher-information-matriz, Fy , is such 
that Fy — F(a, +++, 2n) 18 positive-semi-definite. 

Proor. Suppose 2; = sa \i(j). For each i we have 


J g oe i 
2d Ay a(j)a’(j) — xi2i = 2 diz 2(j)2’(Z) — 2d & Aij Aix Z(j) a’ (Ke) 
i= j= j= - 
and for any ¢ 


g Q 
4 bP dy a(s)2’(Z) - aai |e => Ngla(j)a’(j)t — Ua zit 
j=l j=l 


Q 9 2 
= a rij Eo _ > Aaj ¢2(9) | 


j= 


= 0. 


Hence )°$.: \:;r(j)x'(j) — xa; is positive-semi-definite for i = 1, 
Therefore 


n Q n 


dL 2 di a(j)a’(j) — 2d zi; 


is positive-semi-definite. Therefore 
9 n 
zx ([ rs + 1) a(j)2x’(j) — Fla, +++ , 2) 
j=l i=l , 
is positive-semi-definite where [a] is the integral part of a. We can now identify 
Fy = So$en ([ D021 Ass] + 1) x(J)2’(y) for it is the Fisher-information-matrix of 


the experiment in which [5° \,;] + 1 independent observations are made at 
x(j) (j = 1, --- , g) with error variances unity and also 


> (b> a + 1) Sn+g. 


j=l t=] 


5. From Section 4 we see that the only case in which the ordering of two 
experiments by the criterion AJ(A) is the same for all A is when F,; — F, is 





INFORMATION AND REGRESSION EXPERIMENTS 63 


positive-semi-definite. Clearly since F,; — F; may be neither positive- nor nega- 
tive-semi-definite, not all pairs of experiments can be compared in this clear-cut 
manner. However when A and F are non-singular so is AF and we may write 
\I + AP| = |A| |F| \J + (AF)™"| and 


AI(A) = } log |A| + 4 log |F| + 4 log |\J + (AF)™|. 
If the elements of (AF)~™ are small we have 


AI(A) & } log |A| + } log |F| and AI,(A) — Alq(A) & 4 log (\Fi| / \F:)). 


So we obtain the criterion |F'|. The conditions under which it is valid are when, 
roughly speaking, either 

(i) all the diagonal elements of A are large representing large prior uncertainity 
for all the parameters or 

(ii) all the diagonal elements of F are large which is usually so if the n of (1.1) 
is large. 

We now introduce a criterion based on the Neyman-Pearson theory of tests and 
show that a particular case of it leads to the |F| criterion. Lehmann has given 
[6] a proof that for the experiment (1.1) the uniformly-most-powerful invariant 
test of the hypothesis Hy:6 = 0 is provided by the usual $-test based on 


gw a es Ht 
(y — X6)C(y — X6)/(n — k) 


where 6 are the maximum-likelihood estimates of @. Taking as critical region 
F > Fo, denote by P1;(@) the probability of error of the second kind under the 
alternative hypothesis H:@. A criterion for design can be stated as follows: 
“Take a probability density function for @, p(@), and choose the experiment to 
minimise f p(@)P1:(@) dé.” The choice of p(@) is arbitrary but in a situation where 
we are initially very uncertain about @ it would be sensible to take p(@) to be 
normal with mean 0 and diagonal dispersion matrix with variances all equal to £ 
and consider what happens as E — «. This we now do and state the theorem: 

TuHeoreM 5.1. If p(@/E) is the probability density function just described then 
choosing the experiment to minimise § p(0/E)P1,(0) dé is equivalent to choosing it to 
maximise |F'\. 

Proor. Tang has shown [8] that P,,;(@) depends solely on the function \ = 
3 6’F 8. In fact 


Pr(0) = D> ed'e/i! 
i=0 


where the c; are functions of 7, So , k, n but are independent of X and C and also 
c; ~ 0 as i— o thus making the series uniformly convergent inO S A < o~. 


/ p(6/E)P13(6) dé = (2nE)~* / exp (—40/E~0)Pyx(0) do 





64 M. STONE 


and 


lim [ exp (—40°E6)P(0) do = / P11(6) a8. 


Boo 


Therefore 


/ p(0/E)P1,(0) d8 ~ (2nE)~* / P1:(0) do = (2nE)™ / (= ene*/il) dp 


(2nE)* > ¢; / rier d0/i! 
¢=0 
since the series is uniformly convergent in0 < A < «. Now 
/ Ne do = 2° / (0’F 6)‘ exp (—40’F0) de 


4k ; 
- si ex (—40’F0)(@’Fe)* dp. 


Now under the probability density function (27) |F|* exp (— } 6’F8), 0’F@ is 
distributed as chi-square with k degrees of freedom. Hence 


[ (en) | F | exp (—30°F0)(0F0)' do = o(i, k) 
say where g(7, k) is a function of i and k only. Therefore 
/ p(0/E)Px(6) dd ~ E*(S cx gli, k)/it 2°) | F |? « | FIA. 
Hence minimising f p(@/E)P1,(@) dé is equivalent to maximising |F}. 


We give now an example of the use of the |F| criterion, which has been treated 
by Tocher [9] from another viewpoint. If C = J then F = X’X. Suppose the 


allocation vectors x; = (Xa ,--- , Xa) (¢ = 1, --- , n) can be varied subject to 
the restriction im Xi; = a;. Then: 

Tueorem 5.2. If >>, Xi; = a; (j = 1, --- , k) where a, -+~ , ay are positive 
constants then |\F| is maximum when 2 , --- , Zn are chosen so that F,, = 0, r # 8, 
i.e. when the design is orthogonal. 

PROOF. 


n 


Fre = > X ir Xin 


t=l 


Therefore, by a well-known property of positive-definite matrices 
| FI Ss (> Xin) (Oo Xie) et (>> Xk) = &d2-** a. 


But when F,, = 0, r # s, |F| = ayaz --+ a, . Therefore |F| is maximum when the 
design is orthogonal. 





INFORMATION AND REGRESSION EXPERIMENTS 65 


6. We now consider the modifications in the |F| criterion imposed by the 
presence of nuisance-parameters, ¢, which enter linearly into the expressions for 


the expectations of our random variables, Y, just as the parameters of interest, 
6, do. 


Let there be qg nuisance parameters and suppose that Y is now normal with 
mean X@ + Z@ and dispersion matrix J. For simplicity take the case where 


R« —z6CUSe 
a ae ZZ 


is non-singular. Then @ and ¢ are estimable by the maximum-likelihood estimates 


6 and ¢. Write 
_ (0 aan 
- (%) and o= (%). 


p(a/w) = (2n)*** |F,|* exp [—3(6 — w)/Fi(6 — w)). 
Suppose that the prior distribution for w is 
p(w) = (2n)***° |DI+ exp [—}(w — w)D(w — wx) 


Then 


where 


= 9 


(A and B are the prior dispersion matrices for @ and ¢ respectively.) By Bayes’ 
Theorem we find that p(w/4) is normal with information matrix (F; + D™’). 
To find the information about @ we must integrate out ¢ in (a) p(w) and (b) 
p(w/) to obtain the marginal distributions of @. We find: 

(a) p(@) is normal with dispersion matrix A 


(b) p(@/&) is normal with dispersion matrix L where L is the leading k x k 
diagonal sub-matrix of (F, + D™)™. Then 


Ip(@) = —} log [(2x)* |A|] —3% 
and 
Ip(6/a) = —4 log [(2x)™ |LI] —4% 
and 
(6.1) AI = Ip(6/&) — Ip(6) = —} log |L| + 4 log |Al. 
If the elements of (DF;)~™ are small 


(Fi + D*)* = Fy + (DPF,)")* & Fr’. 





66 M. STONE 


Write 
(6.2) F;' = (6 x) 
**? 
where @ is the k X k dispersion matrix of 6 in p(@/w). Then L & @ and 
(6.3) AI = —} log |@| + $ log |AI. 


The conditions under which the elements of (DF,)™ are small are when, roughly 
speaking, either 

(i) all the diagonal elements of D are large corresponding to large prior 
ignorance of all the parameters or 

(ii) all the diagonal elements of F; are large corresponding to a “strong” ex- 
periment. 
So, by (6.3), we see that under the conditions stated maximising AJ is equivalent 
to minimising |@|. 

A. Wald [10] developed the use of |@| which he called the “generalised vari- 
ance”’ but his justification of it was pragmatical rather than logical. 

In most problems it is usually a simple matter to calculate F, from the alloca- 
tion vectors. Then by Jacobi’s Theorem we obtain |@| = |Z’Z| / |F;{. 

ExampPte 1. A simple 2” factorial experiment without interaction with the base 
level (both factors absent) as the nuisance parameter. k = 2;q = 1. 


No ny Ne ng 
x’ 0 O81 v6 0 1 1 
a, 0 @---9 i ; 3 1 
Zw (Vorvccrccccececcescsicsceccess 1) 
Suppose n = no + m + N2 + ns = 4m. Then 
m+ Ns Ns ™m + Ns 
F, = Ns Natn M+ Ns 
Mtn m+ Ns n 
\@| = n/|Fi| = n/(ningns + nonens + nonin; + nonyne). 


For minimum |@|, n; = m (i = 0, 1, 2, 3). 
EXAMPLE 2. The addition of an interaction term 6; to Example 1. 


No ny Ne N3 

0---0 1---1 0 0 1---1 
X’'’={[0---0 0---0 1 Ligthenyl 
0---0 0---0 O---O 1---1 
Z’ = (i Coe cers evesccseesedecdeececon 1) 


Suppose n = 4m. Then 





INFORMATION AND REGRESSION EXPERIMENTS 


(m + ns ns N% m + ns! 


n3 Na + Ns ng i aL 


| 


Ns Ns M% % 
Li +m mtn ns n 
|\@| = n/nonynan, . 


For minimum |@|, n; = m (i = 0, 1, 2, 3). 
EXAMPLE 3. k treatments and a control [3]; q¢ = 1. 


nm nN NM 
xX’ 


Z' = 
Suppose n is divisible by k + 1. Then 


my 


Nk 
| ms one Nk n 
|@| = n/|Fi| = n/nonm +++ m 
which is minimised when n; = n/(k + 1) (¢ = 0,1, --- , k). 

The calculation of |@| = |Z’Z| / |F;|, though simple in the examples given, may 
be complicated. Then it may be possible to use a method we now elaborate. 

Derinition. An experiment involving nuisance-parameters is “part-orthog- 
onal” if y = 0. [See (6.2).] 

The customary definition of “orthogonal” requires that all non-diagonal 
elements of Fj should be zero. However, if y = 0, we could achieve this con- 
dition by two separate orthogonal transformations of @ and @¢. From an in- 
formational point of view these are irrelevant. 

By Wegner’s Theorem |F7'| < |@| |8| with equality when y = 0. But by 
Jacobi’s Theorem |X’X| = || / |Fy"|. Hence |@| = 1/|X’X| with equality when 
y = 0. From this we derive the working principle: “Find the design which 
maximises |X’X|; if this is a part-orthogonal design (experiment) then it is the 
design which minimises |@!.”’ 

The allocation problem of Section 3 remains important in the presence of 
nuisance parameters. We prove a theorem which generalises Thorem 3.1. It is 
analogous to one due to Chernoff [2]. Now, the allocation vectors are the rows of 
the n X (k + q) matrix (XZ); denote them by u. Given g possible allocation 





68 M. STONE 


vectors u(i) = rd (i = 1, --- ,g) denote by AI(n,, --- n,) the information 
about @ in the experiment consisting of n; observations at u(i) (¢ = 1, --- , g) 


with the errors (m7) uncorrelated with unit variance. 


Tueorem 6.1.' When n is large, AI(n , «~~ , M,) is in general maximised when 
no more than $k(k + 1 + 2g) of the allocation vectors are used. (It is not necessary 
that F, be non-singular.) 

Proor. By (6.1), Al(m , --- ,m,.) = —4 log |L(m, --- , n,)| + 4 log |A| where 
L(m , «++ , Ne) is the leading k X k diagonal sub-matrix of (F; + D™’)™ with 
F, = F,(m, --- , mg) and [File = $1 nyu,(i)u,(¢). Suppose 


Pe 
” -(§ n) 


where P is k X k. Then by Jacobi’s Theorem 


\L(m, +--+, 9)| = |Z’'Z + R| / |Fi + D™|. 
Replace n; by np; , p: 2 0, >. pi = 1: 
L(n, , +--+ , %) — L{np) 

F, — F,{np) 
where 

_ (Finp) ee) 

P, [np] ” Hod H{np) 

say 


Z'Z — H{np| 
\L{np]| = |H{np] + R| / |Filnp] + D™| 


We show that |L[np]| is minimised when no more than $k(k + 1 + 2g) of the 
pi’8 are non-zero: 

(a) If |L[np]| is minimum at p = p(m) where p,(m) = 1 then, since 1 S 
1k(k + 1 + 29), the statement holds. 

(b) If |Z[np]| is minimum at p = p(m) when more than one p,;(m) 0, * we 
proceed as follows. 


By im 


0 —— 
gp, 108! Linel | = Tate + RT ap: 


| H{np) + RI 
1 0 -1 
~ [Fi tepl FD] ap, | al + D" 
But 


@ 


H{nplre = >- np; 2-(i)z,(i) 


i=l 
we 


Filnplee = >> np; u,(i)u, (2) 


t=] 


1 The author is indebted to a referee for suggesting this theorem. 





INFORMATION AND REGRESSION EXPERIMENTS 
and by row-by-row differentiation of the determinants we find 


is | Hinp] + R| = nz’(i) adj (H{np] + R)2(i) 


im | Filnp] + D“| = nw’(i) adj (Filnp] + D~)u(i). 


Therefore 

(6.4) im log| L{np] | = n [e’ (i) (H{np] + R)*z(i) — wu’ (i) (Filnp] + Dy) 
— nu’ (i) Q[np)u(i) 

where 


Qinp] = (F; [np] + D™)* — (5 Cited 4 ws) 


(F; [np] + D™)* (5 — (G{np] + . (H[np] + ea 


Therefore the rank of Q[np] is k. But >°{ p; = 1, therefore by the viethod of 
undetermined multipliers applied to variations of the p,’s for i’s for which 
pi(m) ~ 0, we see that for such 7 there exists a \ such that 


0 
(2 log | Linp] 2 a 


or, by (6.4), u’(i)Q[np(m)] u(t) = —d/n. Hence those allocation vectors which 
are used to minimise |L[np]| must lie on a central quadric of rank k. 
Such a quadric is determined by (k + gq) + (K+ q—-14+°°-4+ 44+) = 
$k(k + 1 + 2q) constants implying that in general no more than $k(k -!- 1 + 2q) 
of the vectors can have their associated p,(m)’s non-zero. Now 


Al (ny Be No) _ —} (log |L{np]|) p._nj/n +} 5 log |Al. 


Therefore, when n is large, we can say that in general AJ(m,, --- , mo) may be 
maximised when no more than $k(k + 1 + 2g) of the n, are non-zero and the 
theorem is proved. 


REFERENCES 

{1] G. E. P. Box anp J. 8. Hunter, ‘Multifactor experimental designs for exploring 
response surfaces,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 195-241. 

[2] H. Cuernorr, ‘‘Locally optimum designs for estimating parameters,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 586-602. 

[3] C. W. Dunnett, ‘Multiple comparison procedures for comparing several treatments 
with control,’’ J. Amer. Stat. Assn., Vol. 50 (1955), pp. 1096-1121. 

[4] S. Enrenrexp, ‘(Complete class theorems in experimental. designs,’ Third Berkeley 
Symposium, Vol. 1, pp. 57-67. 

(5) G. Exrvine, ‘Optimum allocation in regression theory,’’ Ann. Math. Stat., Vol. 23 
(1952), pp. 255-262. 








70 M. STONE 


|6) E. L. Leumann, ‘‘Lecture notes,’’ University of California. 
\7} D. V. Linney, ‘‘On a measure of the information provided by an experiment,”’ Ann. 
Math. Stat., Vol. 27 (1956), pp. 986-1005. 
[8] P. C. Tana, ‘““The power function of the analysis of variance,’’ Stat. Res. Memoirs, Vol. 
2, pp. 126-149. 
{9} K. D. Tocner, “‘A note on the design problem,”’ Biometrika, Vol. 39 (1952), p. 189. 
[10] A. Waxp, “On the efficient design of statistical investigations,’ Ann. Math. Stat., Vol. 
14 (1943), pp. 134-140. 








NOTE ON ESTIMATING INFORMATION! 


By Coun R. Buiytu 
University of Illinois and Stanford University 
1. Summary. This note is concerned with estimation of the Shannon-Wiener 
measure of information. Low bias estimates are obtained, and their bias and 
variance. These estimates are extended to the case where the number of pos- 
sible values of the random variable is not known. The estimates are compared 
asymptotically with the maximum likelihood estimates. They are also compared 


with the minimax estimates (for squared error loss function) for a few special 
cases where these are easily found. 


2. Introduction. Consider a random variable Y with finitely many distinct 
possible values: 


P(Y = aj) = pi, 


A metric measure of dispersion of Y measures how spread out the distribution of 
Y is, in terms of distance in the space of Y. If this space has no relevant distance 
function (e.g., k = 3, a, = green, a, = red, a; = white), there is no relevant 
metric measure of dispersion. An absolute measure of dispersion of Y measures 
the degree to which the total probability of 1 is broken up into pieces in the 
distribution of Y. Such a measure is a function of p; , --- , p only; is free from 
dependence on the a,’s; is large when the probability is much broken up (e.g., 


Pi, °***, Pe = 1/k,---,1/k), small when it is not much broken up (e.g., 
i, *** » De = .99, .01,0, --- , 0). 

In handling both kinds of dispersion measures the following addition property 
plays the same important role: {Divide the values of Y into groups. Dispersion 
of Y = between group dispersion + expected within group dispersion.}. Know- 
ing the distribution of Y gives information useful in predicting Y. Actually 
observing Y gives additional information—enough for perfect prediction. This 
additional information can be called information in Y or unpredictability of Y 
and can be measured by a measure of dispersion. In this language the addition 
property says that the information in observing Y equals the information in 
observing which group Y falls in plus the expected information in observing 
which member of that group. 

For real valued Y the addition property identifies variance (except for a 
constant multiplier) among all metric dispersion measures of the form Ef(|Y — 
EY}|) with f continuous. This easily extends to weighted averages of the partial 
variances when Y has values in a Euclidean n-space. Similarly the addition 
property identifies information or entropy H = — )> p; log: p; (except for a 
constant multiplier) among all absolute dispersion measures fi(p: , --* , Px) with 


Received January 23, 1958; revised April 18, 1958. 
1 Work supported by the Office of Naval Research. 


71 








72 COLIN R. BLYTH 


f continuous and f,(1/k , --- , 1/k) an increasing function of k, as is proved in 
(1). 

The addition property leads to very convenient mathematical simplifications. 
For this reason variance and information H are very widely used dispersion 
measures. But there seems to be little intuitive necessity for the addition prop- 
erty. Thus in the metric case it seems quite reasonable to use measures like 
E| Y — EY | which lack the property- and loss functions other than squared 
error. Similarly in the absolute case it would seem quite reasonable to use meas- 
ures like the natural chi-square measure (k — 1) — k >> (p; — 1/k)* which lack 
the property. Essentially equivalent to this chi-square measure is the following 
linear function of it, which is the terms of order up to 2 in a Taylor series for H: 


H,=1- o{k —2+) (2p, - 1/2). 
t=1 J 


If f(1, 0, --- ,0) = 0 is desired we could make the necessary subtraction from 
the measure H, . 


This note is concerned with estimation of information or entropy of Y: 


k 
H = H(p.,---, pm) = —C 2 pi log p,, 
where C = log, e = 1.442695 and p; log p; is taken to be 0 whenever p; = 0. 
Our estimate is to be based on independent repetitions Y, , --- , ¥Y, of the ex- 
periment Y. Then X,, --- , X,, where X; is the number of Y’s with the value 
a;, is a sufficient statistic for p, , --- , p, and has the following multinomial 


family of possible distributions: 


k 
P(Xi,°*+,Xe = M1, °°* 5%) = nl TT pit/zit, 
(1) k k 
a=0,1,---,n, Da=-n Osp51, Lpe=l 


t=—1 t=] 
We are now concerned, then, with the problem of what function f,(X1, --- , Xx) 
to use as an estimate for H. The maximum likelihood estimate is considered by 
Miller and Madow [2]; it is good when n is large, but is likely to be poor for n 
small. One reasonable estimate would be the best unbiased one. Upon noticing 
that there is no unbiased estimate we will consider instead best estimates with 
low bias. 


3. Low bias estimation, k known. Since (1) is a complete family of distribu- 
tions, the problem of unbiased estimation of any function g(p:,--~- , px) is 
solved by Lehmann and Scheffe [3]. In fact, since Ef (X,, --- , Xx) is for every 
every function f a polynomial in p; , --- , p, of degree at most n, no functions of 
7:1, °** » Px Other than such polynomials possess unbiased estimates. And using 
the usual factorial notation z” = z(x — 1) --- (x — » + 1) we have 


a 7(¥,) 7 (¥,) oad (vy +--+ +0_) "ke 
E{X," --- Xy i=n pil -+* De’, 





ESTIMATING INFORMATION 73 


which reduces to 0 = 0 whenever als v; > n but not otherwise. We therefore 
have 


E > c(n + oma v) xv at XL? /a tt a > e(n <li ve) pi’ ree Ph 


where summation is over any set of (»:, --- , )’s with >t: »; S n for every 
member. It follows from the completeness that >> c(i, --- , ve)Xi'? --- XY”/ 
n“**® is the unique uniformly minimum variance (U.M.V.) unbiased esti- 
mate of >> c(v, --+ , ) pi' --- ps* whenever >t. »; S n forevery term of the 
sum. This solves the problem of unbiased estimation of all functions g(p; , --- , px) 
because the U.M.V. unbiased estimate of every degree < n polynomial in 
Pi, °*** , Pe has been written down and no other functions of p: , --- , px possess 
unbiased estimates. 

It is now clear that there is no unbiased estimate of H. If low bias is what we 
want the next best thing would be to use the U.M.V. unbiased estimate of the 
degree n polynomial which is in some sense (smallest maximum distance apart, 
for example) closest to H. A much more easily obtained polynomial which agrees 
quite closely with H is the terms of degree < n in the Taylor series expansion of H 
about the point (3, --- , 3). We will consider use of the U.M.V. unbiased esti- 
mate of this polynomial as an estimate for H. 

Writing y: = ps — 4 we have 


pi log pi = (4 + ys) log (4 + v3) 


1 _1\. lf yd’ _ Qrd*® , rd" _ 
- ett pret Om re the me ty 


k 
- C2 p log pi 


=1-2{¥ er) + EO — ye Or’ 5 


t=1 2: 


=1- Cf — k) +% y(- 2y;) a 


Here | 27; | S 1 so all series converge absolutely and can be rearranged. Also, 
i= (2y;) = 2 — k. For any integer r S n we now write 


oe 2d (- 21.)*/a| 


a=? im] 


-fle-»+5> aS ) (20, 


The U.M.V. unbiased estimate of H, is 


Ze -1-Se-n+ EEE S(ye ay Xe" 


a=? i=l r= n” 





74 COLIN R. BLYTH 


The bias of Z, as an estimate for H is 


B, = EZ, -— H=H,-H 
=~2 FY (-2y)*/0”. 


3 tt 
Now uw’ is a convex function of u for s an even integer, and —w’ is convex on 
u < 0 for s an odd integer. From this it is easily shown that if }-f.1u, = 2 —k 
and | u;| S 1, then 


s ” 
(1 +(-99 (1-2) 5 (Bat s 1+ (0 
i=l 
These lower, upper bounds are achieved by the choices (uw, --- ,%&) = 
(2/k — 1, -++ ,2/k — 1) and (1, —1, --- , —1) respectively. Applying this to 
the series for B, gives 
CoH (k= 1+(—-)0 -2/h)* <p 5 factotcy a2 © {cD \. 


2 aarti a?) 25 a=rti @ 


This lower bound is achieved when the p,’s are all 1/k, and the upper bound is 
achieved when some p; = 1. If k > 2 this lower bound is positive and we will use 
the estimate 

ag, 2 FH &a=1+t (-YI0 = 2/8" 


a”) 


instead of Z, for H because Z, has the same variance as Z, and uniformly smaller 
bias. For a fixed set 71, --- , 7. we have 


Ck r 
B, < oF (max | 2y3|)"™ 


and the corresponding result for the bias of the improved estimate Z,. To com- 
pute the variance of Z, we now use the fact 


"1 (t)_ (t) 


xv on > Vi it (xX; si a? ?, 


t=0 





min(¥},¥9) vy” tt) 


7 (¥1) yr (rq) 7 (¥y+ sieadl 
xxvr=e 2 Sx 
t=(0 


Further routine calculations now give 


s xv? 3 xv? , min (¥3,¥%9) yi yc) poteyntes sa ) 
E G5 ee 9) (25 i pi = > 1 V2 Sula, 


t=1 t! n'*) 
7 (¥;) 7 (¥9) 
E x;" _ pn) (Xi _ a2 
“An®? ™ n’? 


nt? ( 
¥3)_ (Pq) . . 
( i), a) -1) Pi Pi; tj. 
nm n 


| 








ESTIMATING INFORMATION 


From these, the variance of Z, is seen to be 


var Z, = vn — H,* 
TEE) Se & (Ge - *)} 


-2> 2, p> EC Ys om soe Pi Pis nen 1 


+ a oe: ¥2) vy < per (1 — pi)'\ 

fam tend t! n) 
Grouping together terms of like order in n, the asymptotic variance as n — 
of the sequence {Z,} of estimates is seen to be 


Cc k k 
var (Z,) ~ — 2 pi(log pi — a p; log pi)” 
= j@ 


provided the non-zero p,’s are not all equal; and 

S = i 

n 2(n — 1) 

if the non-zero p,’s are all equal, where k* is the number of non-zero p,’s. And for 
bias of this sequence of estimates we have asymptotically as n — 


kC 9\"*2 kC a 
seen (1-2) S By S 5- (max |2v|}"". 


Comparison with asymptotic results obtained by Miller and Madow for the 
maximum likelihood estimates {H;,} and theestimates{H”} = {H, + C(k — 1)/ 
2n} shows the following: No asymptotic differences in variance. Asymptotic 
differences in expected square error only in the special cases where this has order 
1/n? or smaller. Bias is asymptotically much smaller for {Z,} than for {H;,} or 
{H’.} except for the special case when some p; = 1. 

For small n, numerical checking shows the following: Z, has a smaller bias 
than H., or H*, over most of the range of (pi, --* , px). Comparing expected 
squared error as a whole, H,, is quite poor and there is little to choose between 
Z,, and H7,: sometimes oneseems better, sometimes the other. In these comparisons 
Z, and H*, are modified by substituting C log k for any value exceeding C log k, 
since this uniformly reduces expected squared error. For example in Table 1 for 
k = 2, every value exceeding C log 2 = 1 would be replaced by 1. This table 
gives Z,(2;) in the upper part of each column and H’,(z;) in the lower part, for 
n = 2,3,---,7 and all possible z,;. Values not tabled are obtained from 
Z,(n — 21) = Z,(2) and Hi(n — 2) = H%(2). 

When k = 2, minimax estimates (squared error loss function can be found 
for small n by the usual method of guessing a least favorable a priori distribution 
\ for p: = p and finding the corresponding Bayes estimate Z, . If the risk function 


var (Z,) ~ 





76 COLIN R. BLYTH 


TABLE 1 


Estimates of H for k=2 with Z,(x:) in upper part of each column, 
H’.(x:) in lower part 






































Nl 
2 | 3 4 5 6 7 
0 . 27865 . 27865 . 15843 15843 | .11034 .11034 
1.72135 
1 se087 1.24045 | 1.12022 .92787 .84772 | .74238 
1.12022 | 
2 .36067 1.15875 1.19034 1.12022 | 1.00801 | .96222 
| 1.08516 
3 | 24045 | .99162 1.11524 | 1.12022 | 1-09961 
4 . 18034 .86619 | 1.03852 1.08828 
5 14427 | .77024 | =. 96617 
6 | .12022 | .69472 
7 . 10305 
TABLE 2 
Comparison of low bias Z,, and minimax Z* estimates of H for k=2 
n Z°(0), Z*(1) | sup Rz*() sup Rz,() 
1 1/2, 1/2 .2500 | 1 
2 /2-1,1 .1716 | . 2602 
3 .33673, .94400 .1134 1444 





Rz,(p) assumes its maximum value with \-probability 1 then this is indeed 
least favorable and Z, in minimax. Here for \ we take unspecified probabilities at 
n + 1 unspecified points one of which is 0, with the restriction that \ be sym- 
metric about p = 4. The points and probabilities are then determined so that 
Rz,(p) will have equal maxima at these points. We will compare the risk functions 
Rz:(p) and Rz-(p) [Z, = Z, except Z, = 1 when Z, > 1; Z* is minimax]. One 
point of interest is the degree to which sup, Rz;(p) exceeds sup, Rz(p). This 
comparison, given in Table 2 for n = 1, 2, 3, is of particular interest for small 
values of n where Z,, would be expected to show up most poorly. Actually Z’, 
does quite well even for n = 2, 3: Z,, seems to deviate from unbiasedness in the 
direction of being like the minimax estimate. 

Similarly, for k = 2, we can compare the minimax and U.M.V. unbiased 
estimates of the chi square dispersion measure H, . Equivalently we can compare 
the minimax estimate 7* and the U.M.V. unbiased estimate 7 of the binomial 
variance pq, of which H; is just a linear function. This comparison is given for 
n = 1,2, --- ,5in Table 3. Note that 7 is very poor compared to 7* for small 
nm, compares more favorably as n increases. For example the ratio sup, Rr(p)/ 
sup, Rr-(p) is 5.83 when n = 2, decreases to 3.37 by n = 5. The comparison 
indicates that for small n the minimax estimate for binomial variance is decidedly 





ESTIMATING INFORMATION 


TABLE 3 
Comparison of minimax T* and U.M.V. unbiased T estimates 
for binomial variance pq 


T*(0), T*(1), T*(2) sup Rr*(p) T(O), T(1),7(2) | sup Rr() 


| 
| 
, Tt ~ | 
125, .123, — | .015625 | 
10355, .25, 10355 | = .010724 | 0, 1/2, 0 
.08333, .25, .25 |  .006944 0, 1/3, 1/3 
} 


.07158, .20228, .24584 005124 | 0, 1/4,1/3 
.06508, .17797, .22841 004235 | 0, 1/5, 3/10 


preferable to the U.M.V. unbiased estimate. The estimate 7* is found in the 
same way as Z* except that for n = 3, 4, 5 7* has constant risk and can be more 
easily found by showing that the only constant risk estimate can be Bayes. 


4. Low bias estimation, k unknown. When k is unknown we shall consider 
the estimates obtained by acting as though k were equal to the observed num- 
ber of different Y values and using the estimates of the preceding section. 
Now we have 


Z,= 1-5 (M+ PP Se + W;}, 


where 


yf a? xn 


W, = (2 a ne 1) 4 > > (@) (—2)’ xi” 


The modification of Z, just suggested for use in the case k unknown is 
at =i Swit. + wh, 
where 
we = W; 

=0 

Since W; = — 1/r when X; = 0 we have 
P(Wi = Wi) =1—(1— ps)’, 
P(WE = Wi + 1/r) = (1— pa)”. 


EW, + : (1 — pd", 


1 oe ie nl 
l-= EW, + + + EWe+—2 (1 — pi} 
T j=l 


c< Cx 
= FZ,-5 2 (i — p)" = H,— = 2 (1 — po”. 


r 





78 COLIN R. BLYTH 


mp . ry * ° ° 
The bias of Z, as an estimate for H is 


k 
BY BZ,-H =H,-H-~£Y(-p)" 
i=l 


4y k 
Be — 5. 2 (1 — pi”. 


ry . *. 
The variance of Z, is found as follows: 


2 k k 
var Z* = ab» var w ~ a : COV wiwih, 
tom t#j= 


var W? = E(W; — EW)’ 


fi-(— po"B{(W. — EW) - (1 Mi Pp)" 
+ (1 - po"E LOW, — EW) + : ot *(1 2 pa} 


E(W, - EW)? + MH (a — p)* — (1 — p™)} 


on W, + 3 (1 = p)® — (1 — pd}. 


Furthermore, we have for i ¥ 7 
P(W? ,W; = W.,W,) =1-— (1 — pi)" — (1 — ps)” 
+(l—p—p;)’, 

P(W; , Wj = Wi t+ 1/r,W;) = (1 — ps)” — (1 — vi — DDD’, 

P(W? ,W; = Wi, Ws +1/r) = (1 — ps)” -— A — me — Bs)”, 
P(W? , W3 = Wit 1/r, W; + 1/r) = (1 — vi — Bs)”. 
So the covariance of W?, W;, ij, is 
cov Wi, Wt = E(Wi? — EW?)(W; — EW}) 

= [1 — (1 — p)” — (1 — ps)” — (1 — is — Dy)" 


B{W ~ EW) - O — al ow, — Bw) -S= pi" 


+ {1 — p)” — A — pi — sd") 
. BAW, — EW, + = 0 = a {wv — EW;) - G— pi" 
+ [1 — p)” — 1 — pi — pd") 


. Blow, —_ EW,) ST (1 — Bo | w, ae EW,) + = tee 


r 





ESTIMATING INFORMATION 


+(l- pe - p)*B{(W, — BW) hi c- py 


{cw — BW) + ~— a= pi 


= ew W,, Wi s (1 —p—p)*—-U— pd — pp}. 


Hence 
2 


var Z? = var Z, + Ne (i — p)” — (1 — pd”) 


+ > (a-p—p)*-(— pda - p"h} 


ij=l 


= var Z, + Sie (1 — p,)” — [> (l- p*| 


k 
+ Y -p- pa". 
tgtj=l 

When only one value of Y is observed, which happens with probability }-f.. p?, 
the value of Z? is (C/2){(—1)’/r — 2 -%_, (—1)*/a}. Since um > 0, umn — 0, 
Um > Umi1 ANd Unit — Um > Umsg — Uma together imply convergence of 
> 2_; (—1)"*um to a value > u,/2, this value of Z? has the sign of (—1)™*. In 
the case r even, this negative value should be replaced by 0; bias and variance 
of the resulting modification of Z? are easily found. This point does not arise in 
Section 3 because if k = 1 is known, H = 0 is known and estimation is not 
needed. 

A similar discussion can be given for the estimates H "* — Hi + C(k* — 1)/2r. 
The maximum likelihood estimates H; do not require knowledge of k so can be 
used unchanged in the case k unknown. 


REFERENCES 


[1] C. E. SHANNON AND W. Weaver, The Mathematical Theory of Communication, Uni- 
versity of Illinois Press, 1949. 

[2] G. A. MILLER AND W. G. Mapow, ‘“‘On the maximum likelihood estimate of the Shannon- 
Wiener measure of information,’’ Air Force Cambridge Research Center, 1954. 

[3] E. L. LeumMann anv H. Scuerré, ‘Completeness, similar regions and unbiased estima- 
tion I,”’ Sankhya, Vol. 10 (1950), pp. 305-340. 

[4] D. BLacKWELL AND M. A. Grirsuick, Theory of Games and Statistical Decisions, John 
Wiley & Sons, 1954. 





UNBIASED SEQUENTIAL ESTIMATION FOR BINOMIAL POPULATIONS! 


By Morris H. DeGroor 


Carnegie Institute of Technology 


1. Introduction and summary. The subject of minimum variance unbiased 
estimation has received a great deal of attention in the statistical literature, 
e.g., in the papers of Bahadur [2], Barankin [3], and Stein [14]. The emphasis 
in these papers has typically been placed on the existence and construction of 
minimum variance unbiased estimators when the sampling plan to be used was 
given in advance. 

In this paper, criteria are developed for the selection of an appropriate sam- 
pling plan for the family of binomial distributions. Thus, independent observa- 
tions are to be taken on the random variable U so distributed that 


(1.1) Pr{U =1]|p} =p, Pr {U=0|pjJ=1-p=4 


where p lies in the open interval 0 < p < 1, and the value of a given function, 
g(p), is to be estimated. The problem considered here is that of determining a 
sampling plan and an unbiased estimator of g(p) that are optimal, in some sense, 
at a specified value, po , of p. Optimality will depend, not only on the variance of 
the estimator, but also on the average sample size of the plan. A sampling plan, 
S, and estimator, f, will be considered optimal at po if, among all procedures 
with average sample size at po no larger than that of S, there does not exist an 
unbiased estimator with smaller variance at po than that of f. 

The basic tool to be used is the information inequality (see Lemma 2.7 and 
the discussion following it) which provides a lower bound for the variance of an 
estimator in terms of its expected value and the average sample size of the sam- 
pling plan. If, at po , this lower bound is attained for a particular estimator and 
sampling plan, it may be immediately concluded that they are optimal at po . 
Such an estimator is said to be efficient at po . 

In Section 2, various definitions, assumptions, and fundamental facts to be 
used throughout the paper are collected. 

In Section 3 it is shown that the single sample plans and the inverse binomial 
sampling plans are the only ones that admit an estimator that is efficient at all 
values of p. 

In Section 4 some techniques are given that are often useful in the analysis of 
inverse binomial sampling plans. 

In Section 5 relationships between the average sample size of a sampling plan 
and the functions of p that are estimable optimally are explored. 


Received May 26, 1958; revised September 8, 1958. 

1 This paper is based on a dissertation submitted to the Department of Statistics, Uni- 
versity of Chicago, in partial fulfillment of the requirements for the Ph.D. degree. The 
research was sponsored in part by the Statistics Branch, Office of Naval Research. Repro 
duction in whole or in part is permitted for any purpose of the United States Government. 


80 





BINOMIAL SEQUENTIAL ESTIMATION 81 


In Sections 6 and 7 it is shown that there can exist two distinct sampling plans 
with the same average sample size for all p and some comparisons are made of 
such plans. 

In Section 8 a new characterization of completeness is given for bounded 
sampling plans and it is shown that the dimension of the linear space of unbiased 
estimators of 0 can be determined simply by counting the number of boundary 
points. It is further shown that for a wide class of plans, the estimators that are 
efficient at a given value of p do not have uniformly minimum variance, although 
non-trivial uniformly minimum variance estimators do exist. 

After this paper had been accepted for publication I learned that R. B. 
Dawson had obtained expression (4.13) and Theorems 8.2 and 8.4, as well as 
various other interesting results related to the material of Section 8, in his 
Ph.D. thesis, “Unbiased tests, unbiased estimators, and randomized similar 
regions,” Harvard University, May, 1953. 


2. Definitions, assumptions, and fundamental facts. A formal definition of 
the sampling plans to be considered in the paper will now be given. It will be 
helpful to keep in mind the interpretation of a sampling plan as a rule that 
specifies at each stage of a sequential sampling process whether sampling is to 
cease or another observation is to be taken. Furthermore, it will be helpful to 
visualize a sequential sample as a path in the Euclidean plane—the path starting 
at the origin and being extended at a given stage one unit in either the horizontal 
or vertical direction according as the observation at that stage is 0 or 1. Further 
discussion of these interpretations will be given below. A formal description of 
binomial sampling plans was first given by Girshick, Mosteller, and Savage in 
[7], and the following discussion utilizes several of the concepts presented in 
that paper and the paper of Lehmann and Stein [11]. In what follows, the word 
point refers only to points y of the Euclidean plane whose coordinates X(+7) 
and Y(y) are non-negative integers. 

A sampling plan is a function S defined on the points 7, taking only the values 
0 and 1, and such that for the point 6 with coordinates X(@) = 0 and Y(@) = 0, 
S(#) = 1. 

A path to y is a sequence of points @ = yo,71, °°: ,¥a = y such that S(y) = 1 
for k = 0, 1,---,n — 1 and either 


X (ye+:) 5:4 X(y%%)+ 1 
Y(veu) = Y(v%), 


(2.1) 


or 


X(yeus) = X(ye) 
Y(veu) = Yr) + 1. 


Thus, each point of the sequence is either one unit to the right or one unit above 
its predecessor. 


(2.2) 








82 MORRIS H. DEGROOT 


Under a given sampling plan, 7 is a boundary point if there exists a path to 
and S(y) = 0. It is a continuation point if there exists a path to y and S(y) = 1, 
so that there is also at least one path “through” y. It should be noted that the 
origin @ is always a continuation point. 

A point y is an inaccessible point if there does not exist any path to y. Thus, 
every point can be uniquely classified as a continuation, inaccessible, or boundary 
point. 

The sample size N(y) of any point y is the sum of its coordinates, X(y) + Y(y). 

The boundary B is the set of all boundary points. 

Clearly, the values of S at inaccessible points are irrelevant to the sampling 
process. However, some discussions to be given later will be simplified if it is 
assumed once and for all that S(y) = 1 at all inaccessible points. It should be 
clear that under these conditions S is completely determined by the set of 
boundary points. 

The sampling plans just discussed are somewhat restricted in two respects. 
The first is that for a given sequence of observations U, , --- , U,, the decision as 
to whether another observation is taken depends only on the point 7 reached by 
the sample path prescribed by U,, --- , Um ; i.e., the decision depends only on 
mand >-3_, U;. The justification for considering only such plans is provided 
by the fact that the sequence >-j-,; U;, m = 1, 2, --- , is both sufficient and 
transitive. A thorough discussion of these concepts is given by Bahadur in [1]. 
Similar considerations, especially the theorem of Blackwell [4], justify the defi- 
nition of an estimator, to be given below, as a statistic that depends on the 
observed. sample sequence only through the boundary point reached by the 
sequence. 

The second restriction on the sampling plans is that they are non-randomized. 
A randomized sampling plan is a function S defined on the points y such that S 
may take any value in the closed unit interval. If, at a given stage in the sampling 
process, the point y is reached, then another observation is taken with proba- 
bility S(y) and sampling is terminated with probability 1 — S(y). In this paper, 
only plans for which S(y) is 0 or } are considered. 

The probability of reaching a particular boundary point y is K(y)p"°’q*"” 
where K(y) is the number of distinct paths to y. 

A sampling plan is said to be closed if 


(2.3) 2» K(y)p""q*” = 1 
ve 


, 


for all p, 0 < p < 1. Only closed plans are considered. 
An estimator f is a real-valued function defined on B. The only estimators to 
be considered are those for which 


@4) E(f |p) =X SK pa" 


is absolutely convergent. 
A sampling plan is complete (boundedly complete) if the only estimator (bounded 





BINOMIAL SEQUENTIAL ESTIMATION 


estimator) f, such that E(f|p) = 0 for all p, is the one defined by f(y, = 
for ally e B. 

A sampling plan is simple, if, for each positive integer m, the continaation 
points of sample size m form an interval on the line X(y) + Y(y) = m. 

The basic facts concerning completeness follow. Lemma 2.1 was developed 
by Girshick, Mosteller, Savage, and Wolfowitz in a sequence of papers (7], [17], 
[12]. Lemma 2.2 is due to Lehmann and Stein [11]. 

LemMA 2.1. A necessary and sufficient condition for a closed samyling plan to 
be boundedly complete is that it be simple. 

Lemma 2.2. A necessary and sufficient condition for a closed plan to be seni 
is that it be simple and that the conversion of any boundary point to a continuation 
point destroy closure. 

The functions defined on B and taking the — X(y), Y(v), and N(y) are 
denoted by X, Y, and N, respectively. 

The following conditions are assumed to hold inahabient che paper: 

(i) For every sampling plan to be considered, 


(25) E(N* | p) = N*)K()p"g*™ 


is uniformly convergent on every closed interval of values of p; 
(ii) For every estimator f to be considered, 


(2.6) g(p) = E(f| p) = DS Koyp" gr 


is differentiable termwise in the open interval, 6 < p< 1, and the derived series 
is absolutely convergent. 

A well-known, elementary, and useful sufficient condition for the termwise 
differentiability of the series in (2.6) is that the formal termwise derivative be 
absolutely uniformly convergent on every closed subinterval. This condition is 
in turn very often verified by a dominance argument (see, e.g., [5], pp. 392, 
396). 

Some easy but important consequences of the above assumptions will now be 
given. Some of the results are well-known and have been given elsewhere. 

Lemma 2.3. E(Y?| p), E(X*|p), and E(XY |p) all exist and are at most 
E(N? | p). 

Proor. Since 0 S X S N andO Ss Y SN, the results follow from (i). 

Lemma 2.4. N, X, and Y, considered as estimators, satisfy (ii). 

Proor. E(N | p) = Do yesN(y)K(y)p"‘"q*™. The formal termwise derivative 
of this series is less, in absolute value, than 


LYE NK yp. 
Pq ves 


It follows from (i) that this series converges uniformly on every closed interval 
and, hence, according to the remarks following (ii), E(N | p) is termwise differ- 
entiable. The proofs for X and Y are similar. 





84 MORRIS H. DEGROOT 


Lema 2.5. If f satisfies (ii) and E(f |p) = g(p), then E[(q¥ — pX)f|p| = 
pqg' (9). 

Proor. Termwise differentiation of the series for g(p) yields the result. 

Lemma 2.6. gE(Y | p) = pE(X | p) = El(qg¥ — pX)’| p] = pgB(N | p). 

Proor. These results can be derived directly from Lemma 2.5. However, they 
are well-known and are basic to various aspects of sequential analysis. As a 
result, they have been derived under various conditions; e.g., by Wald in the 
Appendix of [16] as consequences of the fundamental identity of sequential 
analysis, and by Wolfowitz in [18]. 

Lemma 2.7. For any estimator f, 


: pala’ (p)]° 
Equality holds at ¢ particular value of p, say po , if and only if there exist constants 
a and b such that f(-;) = algoY (vy) — poX(y)] + 6 for all boundary points y, where 
gq = 1— po. 

This lemma is also well-known, especially for single sample plans. A brief 
history of this inequality with references is given by Savage in [13], p. 238. It 
was first proved for sequential plans by Wolfowitz [18]. Following Savage, 
(2.7) will be called the injormation inequality. 

A non-constant estimator f is said to be efficient at po if equality holds in the 
information inequality when p = po. The expected value of f is then estimable 
efficiently at po . 

If, for a given sampling plan S, a non-constant estimator f is efficient at p 
for all p, then both f and S are called efficient. From Lemma 2.7, it is seen that 
f is efficient if and only if there exist two functions of p, a(p) and b(p), with 
a(p) # 0, such that 


(2.8) Sy) = a(p)la¥ (y) — pX(y)] + b(p) 


for all p and all boundary points y. 
Two types of sampling plans of prime importance are the single sample plans 
and the inverse binomial sampling plans. In a single sample plan, 


B= {y:N(y) = n} 


for some positive integer n. It is clear that such a plan satisfies (i) since the series 
involved contains only a finite number of non-zero terms. 

In an inverse binomial sampling plan, either B = {y:Y(y) = c} for some 
positive integer c, or B = {y:X(y) = c}. This type of plan was first treated 
formally by Haldane in [8] and [9]; the name “inverse binomial sampling’”’ was 
suggested by Tweedie [15]. It is easily seen that these plans are closed. The 
next lemma shows that assumption (i) is also satisfied. 

Lemma 2.8. For an inverse binomial sampling plan E(N* | p) converges uni- 
formly on every closed interval. 

Proor. Consider the plan for which B = {y:Y(y) = c} for a given positive 





BINOMIAL SEQUENTIAL ESTIMATION 


integer c. Then 


Prin = e+slp} = (°44~ *) oy, j = 0,1,-:- 


E(N*| p) = > (c +5)" (° + ; ° ') pg’. 


On the closed interval0 < iS pSe <1, 
B(N’|p) $ 6D c+a(ets n ') (1 — a4, 
j= 


and the series on the right is convergent, as is readily checked by the ratio test. 
Hence, the given series converges uniformly on the closed interval. The proof 
is entirely analogous when B = {y:X(y) = c}. 

Because of the duality between the two types of inverse binomial sampling 
plans, any facts stated in the paper concerning these plans will be demonstrated 
only for the plan where B = {y:¥(y) = c}. The analogous proofs for the plan 
in which B = {y:X(y) = c} can always be obtained sim ly by interchanging 
the roles of X and Y and of p and gq. 


3. Efficient sampling plans. In this section it is shown that the only efficient 
sampling plans are the single sample plans and the inverse binomial sampling 
plans. For a single sample plan, the efficient estimators are the non-constant 
linear functions of Y, and hence, the functions of p that are estimable efficiently 


are linear in p. For an inverse binomial sampling plan the efficient estimators are 
linear functions of N, and their expectations are linear in either 1/p or 1/q. 
Lemma 3.1. Let S be a given sampling plan for which there exists a non-constant 
estimator f that is efficient at two values of p. Then there exist constants p, v, and &, 
not all 0, such that uX(y) + vY(y) = & for all boundary points y. 
Proor. Suppose f is efficient at po and p, . Then, from Lemma 2.7, there exist 
constants do , bo , a; , and b; such that 


f(y) = adlgo¥ (vy) — poX(y)] + bo 


ailga¥(y) — mX(y)] + bs 
for all boundary points y. Hence 


(3.1) 


(3.2) (dogo — @19:)¥ (vy) — (Gopo — aypr)X(y) = bi — be, 


and since f is not constant, neither ao nor a, is 0. Thus, not both the coefficients 
of Y(y) and X(y) are 0, and (3.2) is an equation of the required form. 

Clearly, the boundary points of the single sample plans and the inverse bi- 
nomial sampling plans lie on straight lines as the lemma demands. The following 
theorem shows that they are the only plans for which this is true. 

TueoreM 3.1. Let S be a given closed sampling plan for which there exist con- 





86 MORRIS H. DEGROOT 


stants uw, v, and &, not all 0, such that uX(y) + vY(y) = & for all boundary points y. 
Then S is either a single sample plan or an inverse binomial sampling plan. 

Proor. There is no loss of generality in assuming » 2 0. The proof is divided 
into cases depending on the magnitudes of yu, v, and &. 

(i) w = v: If €/y is not a positive integer, then there are no boundary points, 
and S is not closed. If §/u = n, a positive integer, then B is a subset of the single 
sample plan with boundary B* = {y:X(y) + Y(y) = n}. Clearly, if S is closed, 
then B = B*. 

(ii) w = 0: If &/» is not a positive integer, then again B is empty, and S is 
not closed. If §/y = c, a positive integer, then B is a subset of the inverse bi- 
nomial sampling plan with boundary B* = {y:Y(y) = c}. Again, if S is closed, 
then B = B*. 

(iii) »y = 0: As in (ii), B must be of the form B = {y:X(y) = ec}. 

(iv) un > 0,» > 0,u + v: If Et Ss 0, B is empty. If — > 0, there exists a point 
y such that u.X(y) + vY(y) < & and either u[X(y) + 1] + »¥(y) > EoruX(y) + 
v[Y(y) + 1] > &. Thus, with positive probability, the boundary is passed and 
S is not closed. 

(v) » > 0, » < 0: Suppose — > 0. By the strong law of large numbers ((6}, 
p. 243), there is positive probability that, for « > 0, 


(3.3) qV(y) — pX(y) > —N(y) 
for every point y reached by the sample path. But for p sufficiently large and 
e sufficiently small the line gY(vy) — pX(v) = —«N(y) lies entirely above the 


line uX(y) + vY(y) = &. Hence, no point satisfying the inequality (3.3) can 
be a boundary point, and there is positive probability that the boundary will not 
be reached. Thus, S is not closed. Analogous arguments hold if § < 0 or é = 0. 

The main result of this section is 

THEOREM 3.2. The only efficient sampling plans are the single sample plans and 
the inverse binomial sampling plans. When B = {y:N(y) = n}, any non-constant 
function of the form a + bY is an efficient estimator of a + bnp, and these are the 
only efficient estimators. When B = {y:Y(y) = c}, any non-constant function of 
the form a + DN is an efficient estimator of a + be(1/p), and these are the only 
efficient estimators. When B = {y:X(y) = c}, any non-constant function of the 
form a + DN is an efficient estimator of a + be(1/q), and these are the only efficient 
estimators. 

Proor. Lemma 3.1 and Theorem 3.1 show that the only procedures satisfying 
certain conditions necessary for a sampling plan to be efficient are the single 
sample plans and the inverse binomial sampling plans. It remains to show that 
these procedures are indeed efficient. 

Let B = {y:N(y) = n}. Then, for every y ¢ B, gY(vy) — pX(v) = Y(y) — np, 
and hence, Y(y) = [¢Y¥(v) — pX(y)] + np. This demonstrates the efficiency 
of Y. Since E(qY — pX | p) = 0, E(Y | p) = np. 

Let B = {y:Y(y) = c}. Then, for every y ¢ B, gY(y) — pX(y) = c — pN(y), 
and hence, N(y) = —(1/p)[q¥(vy) — pX(y)] + (c/p). Thus, N is efficient and 
E(N | p) = c/p. 





BINOMIAL SEQUENTIAL ESTIMATION 87 


The proof is completed by noting that if f is an efficient estimator, then so 
is every non-constant linear function of f. Furthermore, if two non-constant 
estimators are both efficient at a given value of. p, then they are linearly related. 

CoroLuLary 3.1. A non-constant estimator is efficient if and only if it is efficient 
at two distinct values of p. 

Proor. From Lemma 3.1 and Theorems 3.1 and 3.2, it is seen that if f is not 
constant and is efficient at two values of p then the sampling plan admits an 
efficient estimator. From the comments at the end of the proof of Theorem 3.2, 
it follows that this estimator is linearly related to f, and hence, that f itself is 
efficient. The proof in the other direction is trivial. 

4. Inverse binomial sampling plans. Because of Theorem 3.2, it seems worth- 
while to investigate the properties of single sample and inverse binomial sampling 
plans in some detail. 

For the single sample plan with boundary B = {y:N(y) = n}, Y/n is an 
efficient estimator of p with Var(Y/n |p) = pq/n. For any estimator f, 


(4.1) E(f\| p) = 2 I) (7) prrge™ | 


which is a polynomial in p of degree at most n. Thus, only polynomials of degree 
at most n are estimable unbiasedly, and since 


- p[ Yor) ps 


n(n —1)---(n—k) |” 
fork = 0,1, --- ,n — 1, then every such polynomial is estimable. 
The analogous properties of an inverse binomial sampling plan are less familiar 
and will be discussed here. 
Consider the plan with boundary B = {y:Y(y) = c}. For each non-negative 
integer, k, there exists a unique point, y, , of B such that N(y.) = c + k. 
Since 


(4.3) Prin=c+kip} = ("te") a, k= 0,1,2,---, 


then for any estimator f, 


(4.4) E(f\p) =p" > so) (' ° ; a ') q. 


The class of functions that are estimable unbiasedly and their estimators 
will now be determined. For convenience, the functions are written as functions 
of q rather than of p. 

THEOREM 4.1. A function h(q) is estimable unbiasedly if and only if it can be 
expanded in Taylor’s series in the interval |q| < 1. If h(q) is estimable, then its 
unique estimator is given by 


> ie | h(a) | : * 
(4.5) Sv) - (k+e—D)! dgt a — oo * k = 0,1, 2, ‘ 





88 MORRIS H. DEGROOT 


Proor. If h(g) can be expanded in Taylor’s series in the given interval, then 
so also can h(qg)/(1 — q)*, and conversely. Thus, suppose 


_h@ | 
bq 
(1 — q)° - hp 
Then 
h(g) = p° Do bag’, 
k=0 
and taking 


fin) = & /(*+é-*) 


yields an estimator f with E(f| p) = h(q). 
Suppose now that &(q) is estimable unbiasedly. Then there exists an estimator 
f such that 


h(q) = p* & sow) + ; vs ') q. 


Replacing p by 1 — q gives the required expansion. Thus, 


q@ _¢F k+ce—1\. 
= Eaw(* te") ¢, 


(75>) - del oor. 


__©@-))! @¢ Me | 
in) = Ge piag | os 


The uniqueness of f follows from the uniqueness of the Taylor’s series expansion, 
which is of course the basis of the completeness of this sampling plan. 

Given h(q), Theorem 4.1 provides a rule for finding its unbiased estimator. 
It is often possible to find the expectation of a given estimator in closed form by 
using the fact that if the series 





and hence 








(4.6) ¢(z) = d b,x" 


is differentiated m times within its interval of convergence, then 
(m) 2 
o” (x) _ k+m\ « 
(427) Bo mF baa (tT) ot 


As illustrations of the technique involved, the variance of the unbiased estimator 
of p and the moment generating function of N will be determined. 





BINOMIAL SEQUENTIAL ESTIMATION 89 


It is well-known (and can be checked from Theorem 4.1) that if f is given by 
(4.8) Sm) = (e — 1)/k +e — 1), 


then, for c = 2, f is an unbiased estimator of p. It should be emphasized that 
this is not an efficient estimator. Using the result given above, 


E( f*| p) = pe - . eeecp ars "\¢ 


. ‘te Be ao 
“Pann amielEEe|- 


Note that the constant term in the last series of (4.9) is taken to be 0. Its value 
can be assigned arbitrarily since it does not appear in the derived series. But 


dizvsl |] Glan Il, 1, 
Hence, 


(4.11) fy le = [so-2], 


and 


(4.9) 


(4.12) E(f’? |p) = 


on a 


Using the easily verified fact that 


ie] - EC )Lag ee - | ZG) 


one obtains 


a” [ms (1 — 2] _ (—1)"m! log (1 — g) 
dq™ q qv? 


Thus, referring to (4.12), 


ey (— 1)**(c —1)p* log (1 — g) 
E Sit epeiinn schtihe eet sa etait 
(f° | p) = 
c—2 _ 


+(c- De 2 ape 


owt - tet 
= -|(-— 1) 1 —— {=} |. 
ga | DT log p + ) : 

It is interesting to note that Haldane [9] gives E(f’ | p) in the form of an integral 
that, in order to be evaluated, requires repeated integration by parts. 

In Section 3 it was shown that N is an efficient estimator of its expected value. 
Using the technique just illustrated, its moment generating function can be 


m (— wens 
! deerneanititelintaiiiieen 
+m! = a(1 die q)iqh* > 





90 MORRIS H. DeGROOT 


found. For t < log (1/q), 


E(e'* | p) = Z eter) (' + : — ') q 


= (pe')° - r - ory ‘) (qe')* 
k=0 
tye 1 oe = k 
7 (pe) (c = meno? L. 


= (pe')? 1 = (75) 
= MPC) Ce Tl dz NT — a] enget 


Pe Ga 


-( pe’ y 
1 — ge] © 


It should be noted that N — c has a negative binomial distribution and that 
(4.14) could also be derived using techniques appropriate to that distribution 
(see, e.g., Feller [6], pp. 155, 252). 
Differentiating (4.14) at t = 0 yields 
E(N | p) = c/p 
(4.15) E(N* | p) = (¢' + ¢g)/p° 


2 


(4.14) 


(c — 1)! (1 — ge‘) 


Var(N | p) = cq/p 


Since N is an efficient estimator of c/p, its variance should attain the lower 
bound of the information inequality; 7.e., 


. _ palg’(p)]’ 
(4.16) Var (N | p) = EN |p)" 


Setting g(p) = E(N | p) = c/p does yield the value found in (4.15). 


5. Relationships between sampling plans and estimable functions. For a given 
sampling plan, the only estimators that are efficient at a given value po are the 
non-constant functions f* of the form 


(5.1) f*(y) = algo¥(y) — poX(y)] + 6, 
for some constants a and 6 and all boundary points y+. 
Lemma 5.1. If f* is given by (5.1), then f* is constant if and only if a = 0. 
Proor. If a = 0, then f*(y) = 6 for all boundary points y. If a ~ 0, then 
f* is constant only if q¥(vy) — poX(y) is constant for all boundary points y. 
But that would mean that the points of B lie on a straight line and, hence, the 
sampling plan must be either a single sample plan or an inverse binomial sam- 


pling plan. It is readily checked that for neither of these is q¥(vy) — poX(y) 
constant when 0 < po < 1. 





BINOMIAL SEQUENTIAL ESTIMATION 91 


Thus, a non-constant estimator {* is efficient at po if and only if it is of the 
form (5.1) with a + 0. The next theorem determines the class of functions that 
are estimable efficiently at a given point, simply by evaluating E(f* | p). 

THEOREM 5.1. For a given sampling plan, a non-constant function g(p) is estimable 
efficiently at po if and only if there exist constants a and b with a * 0, such that 


(5.2) g(p) = a(p — po)E(N | p) + b; 
or alternatively, if and only if there exists a constant k, k # 0, such that 


E(N | p) = klg(p) — g(po)|/(p — po) for p ¥ po, 
E(N | po) = kg’(po). 
Proor. By Lemma 2.6, 
E(qY — poX |p) = E(gY — pX + pN — pN | p) 
= E(q¥ — pX |p) + (p — po)E(N | p) 
(p — po) E(N | p). 


If g(p) is estimable efficiently at po then it must be constant or the expectation of 
an estimator of the form (5.1) with a # 0. Hence, 


(5.4) g(p) = E(f*| p) = a(p — po)E(N | p) + b. 


Setting p = po in (5.4) gives b = g(po). Differentiating both sides of (5.4) at 
po gives g'(po) = aE(N| po). Thus, both (5.2) and (5.3) are satisfied 
(with k = 1/a). 

Conversely, if either (5.2) or (5.3) is satisfied then g(p) is the expectation of 
an estimator of the form (5.1) with a = 1/k and b = g(po). 

From Theorem 5.1, it is clear that there does not always exist a sampling 
plan that admits estimation of a given function efficiently at a given value of p. 
Since E(N | p) can never be smaller than unity, obvious restrictions on g(p) 
and po are that [g(p) — g(po)]/(p — po) cannot change sign in the open interval 
0 < p < 1 and that g’(po) cannot vanish. However, no general result char- 
acterizing the class of functions ¢(p) for which there exist sampling plans with 
E(N | p) = $(p) has, to the author’s knowledge, yet been found. 


6. Some sampling plans. Theorem 5.1 reveals that the functions g(p) that 
are estimable at a given value pp depend only on E(N | p) in a very simple way. 
From the specific form of this relationship it is seen that if it is desired to estimate 
a polynomial in p efficiently at a particular value, then E(N | p) must itself be 
a polynomial in p of a certain form. In this section some sampling plans for which 
E(N | p) is a polynomial will be described. It should be noted that if, for a given 
sampling plan, there exists a positive integer n such that Pr{N s n|p} = 1, 
then E(N | p) is a polynomial of degree at most n — 1. One of the interesting 
things about the plans given below is that they yield polynomials for E(N | p) 
even though they do not satisfy this condition. In particular, one of the pro- 
cedures will be an unbounded sequential plan for which E(N | p) is constant. 


(5.3) 





92 MORRIS H. DEGROOT 


Hence, considering the single sample plan with the same E(N | p), it may be 
concluded that E(N | p) does not determine the sampling plan. 
Scheme I. Let n and m be two positive integers and let 


(6.1) B = {y:N(y) = 7, Yq) < nj} u fy: Yq) = n+ m}. 


That is, a single sample plan of sample size n is used unless all of the observations 
are equal to 1. When this happens, sampling is continued under an inverse bi- 
nomial sampling plan until m additional 1’s are obtained. 

The function E(N | p) for the inverse binomial sampling plan was found 
in Section 4. Using this result, it is easy to obtain E(N | p) for the above plan. 


E(N |p) = E(N|N Sn, p)Pr(N Sn| p) + E(N|N >n,p)Pr(N > n| p) 
= n(1 — p") + [n + (m/p)]p” = n + mp”. 


For example suppose n = 2,m = 4. Then E(N | p) = 2 + 4p, and it follows 
from Theorem 5.1 that p’ is estimable efficiently at 1/2. Indeed, 


E(N | p) = 4(p + 1/2) = 4(p* — 1/4)/(p — 1/2) 
= 4[g(p) — g(po)]/(p — po), 


when g(p) = p’, po = 1/2. In the same way, p’ + (1/2 — po)p is estimable effi- 
ciently at po . 

If n = 1, then, from (6.2), E(N | p) = m + 1 for all p. From Theorem 5.1 it is 
seen that this plan admits an efficient estimator of p at any value p) . This does 
not mean that the plan is efficient— indeed, it is known not to be. Thus, although 
for every po there is an estimator of p that is efficient at po, there is no single 
estimator that is efficient at all values of p. As has been shown, such an estimator 
exists only under a single sample plan. From Lemmas 2.1 and 2.2 it is seen that 
this plan (and every plan included in Scheme I) is boundedly complete but not 
complete. 

It is interesting to compare the various estimators of p for the plan with 
n = 1. The only bounded unbiased estimator of p is 


(6.4) oly) = [1/(m + 1)]¥), 


which equals 0 or 1 according as the first observation is 0 or 1. Hence, 
Var(fo| p) = pg. Since E(N | p) = m+ 1, N — (m + 1) isan unbounded, un- 
biased estimator of 0. Hence, any estimator of the form 


(6.5) f(y) = a4 Y(y) +c E - FO) |, 


(6.2) 


(6.3) 





for some constant c, is an unbiased estimator of p. It is easily checked that f,, 
is the efficient estimator at po . Thus, the efficient estimators of p are unbounded; 
this serves as a reminder that unbiased estimators—even efficient ones—are 
not necessarily desirable estimators. 

Using the moments of N found in Section 4 for the inverse binomial sampling 





BINOMIAL SEQUENTIAL ESTIMATION 


plan, Var(f,, | p) is found to be 


(6.6) it [1 + m(® _ 1) |, 


Comparing this result with the variance of the bounded estimator, fo , it is seen 
that Var(f,, |p) < Var(fo| p) for p > po/2. 

Finally, it should be noted that the functions that are estimable efficiently at 
each p under this plan are precisely the functions that are estimable efficiently 
under a single sample plan of sample size m + 1. The latter plan yields the same 
E(N | p) as the above plan and admits an unbiased estimator of p with uniformly 
smaller variance than any of the estimators discussed above. 

The following variant of Scheme I also yields polynomials for E(N | p). 

Scheme IT. Let n and m be two positive integers with n = 2, and let 


(6.7) B= {y:N(y) = n, YQ) <n — 1p u fy: YQ) =n+m}. 


Thus, the boundary is the same as in Scheme I except that the point with co- 
ordinates X(y) = 1, Y(y) = n — 1, is omitted. In other words, a single sample 
of size n is taken. If all of the observations are equal to 1, sampling is continued 
until m additional 1’s are obtained. If all but one of the initial n observations 
are equal to 1, sampling is continued until m + 1 additional 1’s are obtained. If 
at least two of the initial n observations are not equal to 1, sampling ceases. 
Thus, denoting the number of successes in the first n observations by Y, , 


E(N |p) = E(N| Yn <n —1, p)Pr(¥, <n — 1| p) 
+ E(N| Y, = n — 1, p)Pr(Y, = n — 1| p) 
+ E(N | Y, = n, p)Pr(Y, = n| p) 
= n(1 — p* — npg) 
+ (n mt ‘) (np""'q) + (n + =) 2 
P Pp 
=n+n(m + 1)p™” + (m — n — mn)p”". 


The sampling plans included in Scheme II are also boundedly complete but 
not complete. 
There are obvious variants of Schemes I and II obtained by removing other 


points on the line N(y) = n from the boundary, or by interchanging the roles 
of X and Y. 


7. Selection of the sampling plan. Theorem 5.1 shows that in order to esti- 
mate a function g(p) efficiently at po a sampling plan must be selected for which 


(7.1) E(N | p) = klg(p) — g(po)l/(p — po). 
The efficient estimator at po is defined by 
(7.2) f(y) = alqgo¥ (vy) — poX(y)] + g(po), 





94 MORRIS H. DEGROOT 


where a = 1/k. As discussed in Section 5, there does not always exist a sampling 
plan under which a given function g(p) is estimable efficiently at a given value, 
po . On the other hand, it is made clear in Section 6 that for a given g(p) and po , 
there may exist more than one sampling plan satisfying (7.1). Indeed, it has been 
shown that there may exist more than one plan with a given E(N | p) (i.e., with 
a given value of k). 

In this section it is assumed that for a given g(p), po, and k, there does exist 
more than one sampling plan satisfying (7.1). Let § denote the class of such plans. 
Since every plan of § yields the same E(N | p), and since, under every plan of §, 
the estimator f given by (7.2) is efficient at po , it follows from the information 
inequality that Var(f | po) is the same under every plan of 8. In general, how- 
ever, for values of p other than po , Var(f | p) will be different under the various 
plans of $8. The problem considered here is that of determining the plan of $ for 
which Var(f | p) is smallest at some value of p other than po, or the plan that 
minimizes Var(f |p) in the neighborhood of po. It will be shown that this is 
equivalent to determining the plan for which Var(N | p) is minimized at the 
relevant values of p. 

In the following derivation of Var(f|p) it is assumed for simplicity that, 
in (7.2), a = 1 and g(po) = 0. Thus, 


f = mY — poX = (GY — pX) + (p — po)N, 
f? = (q¥ — pX)’ + (p — po)'N’ + Ap — p)N(G¥Y — pX). 
This yields, upon application of Lemmas 2.5 and 2.6, 
E(f|p) = (p — po E(N | p), 


E(f’ | p) = pqE(N | p) + (p — po) E(N’ | p) + 2(p — po)pgk’(N | p), 
and hence, 


(7.3) 


(7.4) 


(7.5) Var(f| p) = (p — po) Var(N | p) + pgE(N | p) + 2(p — po)pg’(N | p). 


This expression yields the following theorem, where for any estimator T and 
sampling plan S, E(T | p, S) is the expectation of T at p under the plan S. 

THEoREM 7.1. Let f = a(qgoY — poX) + 5b, Let p* be a value of p other than po. 
Let 8; and S, be two sampling plans such that E(N | p, S:) = E(N | p, Se) for all p. 
Then Var(f|p*, S:) = Var(f|p*, S:) if and only if Var(N|p*, S,) = 
Var(N | p*, S:). 

It follows from assumption (i) of Section 2 that Var(N | p) is a continuous 
function of p. Hence, again referring to (7.5), the following result is immediate. 

THEOREM 7.2. Let f, S; , and S, satisfy the hypotheses of Theorem 7.1, and suppose 
Var(N | po, S:) < Var(N | po, S2). Then there exists an interval I of positive 
length containing po such that 


(7.6) Var(f | po, Si) = Var(f | po, S2), 


Var(f | p, Si) < Var(f | p, S2), forpel,p # po. 





BINOMIAL SEQUENTIAL ESTIMATION 95 


One method that might at first seem useful in determining the plan of $ for 
which Var(N | po) is minimized would be to show that, for a particular plan, this 


variance attained the lower bound provided by the information inequality; 
i.e., that 


E'(N | po)]’ 
(7.7) Var (N| p,) = PagelBi(N | pod] 
| Pe E(N | pa) 
The next lemma and theorem show that there exists such a plan only in the 
trivial situations where $ contains an efficient sampling plan. 
Lemma 7.1. For a given sampling plan, 


_ palE’(N | p)}? 
(7.8) Var (N | p) EW 1p) 
for all p tf and only if the sampling plan is efficient. 

Proor. Suppose the plan is efficient. In a single sample plan, N is constant. 
Hence, Var(N | p) = E’(N |p) = 0 for all p, and (7.8) holds. In an inverse 
binomial sampling plan, N is efficient and hence again (7.8) holds. 

Suppose the plan is not efficient. Since N is constant only under a single sample 
plan, it is not constant under the given plan. Thus, (7.8) cannot hold for all p, 
for if it did, N would be an efficient estimator. 

THEOREM 7.3. For any sampling plan, either 


: _ palE’(N | p)]’ 
(7.9) Var (N |p) = EW 1p) for all p, 


or 


palE’ (N | p))}’ 
(7.10) Var (N | Pp) > ~ EWN |p) for all p- 


Proor. From the information inequality, 


: . pqlE’ (N | p)}? 
(7.11) Var (N | p) = E(N |p) for all p. 


Suppose equality holds in (7.11) for p = po. Then N must be of the form 
N(y) = algo¥(y) — poX(y)] + 6 


for some constants a and b. Hence, 


(1 — ago) ¥(y) + (1 + apo)X(y) = b 


for all boundary points +. Since the coefficients of X and Y cannot both vanish, 
the boundary points lie on a straight line, and, by Theorem 3.1, the plan is 
efficient. The conclusion follows from Lemma 7.1. 

It would be reassuring to know that, despite Theorem 7.3, there always exists 
a plan in $ for which Var(N | po) is smallest. This would be true if § contained 
only a finite number of sampling plans. No general results have as yet been 





96 MORRIS H. DEGROOT 


obtained in this direction, but the following theorem concerning the specific 
case where § contains all plans with E(N | p) = 2 indicates the type of result 
that might be expected. 

THeEoreM 7.4. There exist exactly three sampling plans for which E(N |p) = 2 
for all p. 

Proor. Three plans have already been given for which E(N | p) = 2; namely, 
the single sample plan, the plan S* given under Scheme I of Section 6, with 
n = 1, m = 1, and the symmetric image of this plan obtained by interchanging 
the roles of X and Y. It will now be shown that these are the only three plans. 
Throughout the proof, points will be specified by their coordinates (X(v), Y(v)). 

Consider a sampling plan S with boundary B for which E(N | p) = 2 for all p. 
The points (0, 1) and (1, 0) cannot both be in B, for then E(N | p) = 1. If neither 
(0, 1) nor (1, 0) isin B, then N = 2. Thus, if E(N | p) = 2,then Pr(N = 2| p) = 
1 and S is the single sample plan. Suppose then that exactly one of the points 
(0, 1) and (1, 0) is in B and, for the moment, assume it to be (1, 0). It will be 
shown that S is the plan S* described above with boundary 


B* = {(1, 0), (0, 2), (1, 2), (2, 2), (3, 2), «++ J. 


Suppose (0, 2) 2B. Then N 2 3 whenever the sample path goes through 
(0, 2). Hence, E(N | p) = 3p’, which is greater than 2 for values of p arbitrarily 
close to 1. Thus, (0, 2) ¢ B. Also, (1, i) z B, for, otherwise, Pr{N <S 2|p} = 1, 
Pr{N < 2|p} > 0, and E(N | p) < 2. The proof is now completed by an induc- 
tive argument. 


Assume, for a given integer n, n = 0, the points (1, 0) and (0, 2), (1, 2), ---, 
(n, 2) are in B and the points (0, 1), (1, 1), --- , (w + 1, 1) are not in B. It must 
be shown that (n + 1, 2) e Band (n + 2,1) zB. 

Suppose (n + 1, 2) zg B. Then N = n + 4 whenever the sample path goes 
through (n + 1, 2). This happens with probability p’q"*’. Thus, by considering 
the other points of B, 


E(N | p) = q + 2p? + 3p’q +--+ + (mn + 2)p’q” + (n + 4)p'g”™ 
qt p(l + 2g +--+ + (nm + 1)q” + (n + 2)q"*) 
+p(lt+qt+---+e° +0") + pe” 
qt+p ‘i [| +p Aa i) + pq" 
= 2+ pq" — (n+ 4)pq"* — q”™. 
Now let p = 1 — 6, g = 6. Then 
E(N | p) = 2 + 8" + 0(8""), 





and for 6 positive but arbitrarily close to 0, E(N |p) > 2. It follows 
that (n + 1, 2) € B. 
Now, if (n + 2, 1) were in B, then S would be a bounded procedure and its 





BINOMIAL SEQUENTIAL ESTIMATION 97 


continuation points would be a proper subset of the continuation points of S*. 
Since S and S* yield the same E(N | p), this is impossible. Thus, (n + 2, 1) zB 
and the induction is completed. It follows that S = S*. 

If, originally, it was assumed that (0, 1) ¢ B, (1, 0) z B, then an entirely analo- 
gous demonstration would show that S must be the symmetric image of S* 
described at the beginning of the proof. 


8. On the completeness of bounded sampling plans. A bounded sampling plan 
is one for which there exists a positive integer n such that 


(8.1) Pr{N Ss n|p} =1. 


The size of a bounded sampling plan is the smallest n for which (8.1) holds. 

In the following Theorems 8.1-—8.5 it is shown that if the boundary of a plan 
of size n contains n + 1 + k points, k = 0, then k is the dimension of the linear 
space of unbiased estimators of 0. 

THEOREM 8.1. The boundary of a sampling plan of size n contains at least n + 1 
points. 

Proor. If n = 1, then (0, 1) and (1, 0) must both be boundary points. Pro- 
ceeding by induction, suppose the theorem to be true for n = m and consider 
a plan S of size m + 1. Since m + 1 2 2, the points (0, 1) and (1, 0) cannot 
both be boundary points and there must exist a path through at least one of them 
that extends to a boundary point of sample size m + 1. Without loss of gen- 
erality, suppose this is true of the point (1, 0). Then, given that the sample 
path has reached (1, 0), the sampling plan (or what remains of it) is now of size 
m and hence, by the induction hypothesis, involves at least m + 1 boundary 
points. In other words, there are at least m + 1 boundary points that can be 
reached by paths through (1, 0). Since S is closed and bounded there must also 
exist a boundary point of the form (0, y). Since this point cannot be reached 
from (1, 0) it is not counted above, and the boundary contains at least m + 2 
points. 

THEOREM 8.2. If the boundary of a sampling plan of size n contains more than 
n + 1 points, the plan is not complete. 

Proor. The probability, P(p; y), of reaching a particular boundary point y is 


P(p; vy) = K(y)p""¢*™, 
a polynomial in p of degree at most n. Hence, the expectation 


» f(y) P(p; v) 


of any estimator f is also such a polynomial. Conversely, every linear combination 
of the P(p; y) can be attained as the expectation of some estimator. Thus, there 
exists a non-trivial estimator f such that E(f | p) = 0 for all p if and only if there 
exist constants a(y), not all 0, such that cee a(y)P(p; vy) = 0 for all p. But the 
linear space of polynomials in p of degree at most n has dimension n + 1 and 
hence, if B contains more than n + 1 points, the set of polynomials 
{P(p;y):v7 € B} must be linearly dependent. 








98 MORRIS H. DEGROOT 


TxHEorem 8.3. Under a sampling plan of size n, all polynomials in p of degree 
at most n are estimable unbiasedly. 

Proor. The polynomial 1 is trivially estimable unbiasedly. Furthermore, 
since the plan is of size n, there exists a boundary or continuation point 7; of 
sample size j for each j, 1 Sj S n. Lety; = (x; , yj), where xz; + y; = j. Girshick 
Mosteller, and Savage [7], have shown explicitly how to construct an unbiased 
estimator of p”‘g*', a polynomial in p of degree 7. Thus, the polynomials 
1, pg", +--+, p’"q™ are all estimable unbiasedly and, hence, so is any linear 
combination of them. Since no two of these polynomials are of the same degree 
they are linearly independent and thus form a basis for the n + 1-dimensional 
space of all polynomials in p of degree at most n. It follows that every such 
polynomial is estimable unbiasedly. 

TuHeoreM 8.4. If the boundary of a sampling plan of size n contains exactly 
n + 1 points the plan is complete. 

Proor. The expectation of every estimator is a linear combination of the 
n + 1 polynomials P(p; y), y ¢ B, and by Theorem 8.3, every polynomial of 
degree at most n can be expressed as such a linear combination. Thus, the set 
{P(p; y):7 € B} spans the n + 1-dimensional linear space of polynomials. It 
follows that the n + 1 polynomials P(p; vy) must be linearly independent and, 
hence, there exist no non-trivial unbiased estimators of 0. 

THEOREM 8.5. If the boundary of a sampling plan of size n contains n + 1 + k 
points, k > 0, then there exist exactly k linearly independent unbiased estimators of 0. 

Proor. Consider the n + 1 + k boundary points 7, -+- , Yn4i:4« . Each esti- 
mator f can be considered as a vector (fi, -++, fnai4e) where f(y;) = f;, 
j =1,---,n+ 1+ k. Thus, the space of estimators can be considered as an 
n + 1 + k-dimensional linear space, V. The expectation operator F is a linear 
mapping from V onto the n + 1-dimensional linear space of polynomials in 
p of degree at most n. The subspace V° = {f:E(f| p) = 0 for all p} is the null 
space of this mapping and it follows from the standard theorems concerning rank 
and nullity that V° has dimension k. 

The remainder of this section is devoted to bounded sampling plans that are 
complete only after the removal of some points from the boundary. It will be 
shown that for a plan of this type it is easy to explicitly construct a basis for 
the space of unbiased estimators of 0. 

The following notation is used in Theorems 8.6-8.8. It is ‘assumed that S is a 
bounded sampling plan with boundary B, and that 6,,---, 8, with ¢ > 0, 
are t points of B such that the sampling plan with boundary B — {6;, --- , By} 
is closed and complete. 

The sampling plan with boundary B; = B — {8;} is denoted by S; for 
j= 1,-:-, t. 

For any point y ¢ B, K(y) is the number of paths to y under the plan S and 
K ,(y) is the number of paths to y under the plan S; for 7 = 1, --- , t. If y is not 
a boundary point for a particular plan then the number of paths to y under 
that plan is taken to be 0. Thus, K;(8;) = Oforj = 1, --- ,¢. 





BINOMIAL SEQUENTIAL ESTIMATION 99 


It should be noted that K(y) does not vanish for any y ¢ B. Also, if y # 8; 
then K;(y) = K(y). This follows from the fact that any path to y under S is 
obviously a path to 7 under any sampling plan with fewer boundary points. 


Consider now, under the sampling plan S, the estimators ¢,; defined 
forj = 1,---,t by 


(8.2) oy) = 1 — [Kily)/K)], 
for y ¢ B. 

THEOREM 8.6. Under the sampling plan S, the estimators qd, , --+ , d form a 
basis for the linear space of unbiased estimators of 0. 

Proor. It was shown by Girshick, Mosteller, and Savage [7], and it is very 
easy to verify, that, for each 7, ¢; is not identically 0 and E@; | p) = 0 for all p. 
Since the boundary of S contains ¢ more points than that of a complete plan of 
the same size as S it follows from Theorem 8.5 that the dimension of the space 
of unbiased estimators of 0 is t. Thus, the theorem will be proven if it is shown 
that the estimators ¢; , --- , d are linearly independent. 

Let the sample size of 8; be n; (i.e., X(8;) + Y(8;) = n;) and assume that 
8, , --- , 8: have been ordered so that nm. S ne S --- S n,. Consider the matrix 
A with elements a;; = $;(8;); thus, 


oi(Bi) --+ (1) 


$1(8.) eds $4(B,) 


For j > i, K;(8:) = K(8,). This follows by noting that a path to 8; under S; 
goes through points of sample size less than n; and then through @; . It does not 
go through 6; since n; 2 n;, and hence, it is also a path to 8; under S. Thus, 
K,(8:) S K(8;). The reverse inequality has been stated in the comments pre- 
ceding the theorem. It follows from (8.2) that $,8;) = 0 for 7 > i. 
Since ¢;(8;) = 1 for each j, the matrix A is seen to be triangular with each of 
its diagonal elements equal to 1. Hence, |A| = 1. It can now be concluded that 
di, -*: , are linearly independent, for otherwise |A| must vanish. This com- 
pletes the proof. 

Under a given sampling plan, an estimator f* is said to be a uniformly minimum 
variance estimator of its expected value, if, for any other estimator f with the 
same expectation as f*, Var(f* | p) S Var(f | p) for all p. Consider the estimator 
f* = qY — poX that is efficient at po . The following theorem shows that under 
a sampling plan of the type now being considered f* is not a uniformly minimum 
variance estimator. On the other hand, if f is any other estimator with the same 
expectation as f*, then Var(f* | po) < Var(f | po) and fis not a uniformly minimum 
variance estimator. Thus, under the sampling plan S, if a non-constant function 
g(p) is estimable efficiently at po then there is no uniformly minimum variance 
estimator of g(p). 

THEOREM 8.7. Under the sampling plan S, the estimator qY — poX is not a 
uniformly minimum variance estimator. 





100 MORRIS H. DEGROOT 


Proor. Lehmann and Scheffé [10] have shown that for bounded sampling 
plans a necessary and sufficient condition for an estimator T to be a uniformly 
minimum variance estimator is that E(T¢@|p) = 0 for every ¢ such that 
E(@ | p) = 0. Suppose go¥Y — poX was a uniformly minimum variance estimator. 
Then, for 7 = 1,---, 4, 


0 = Elos(qoY — poX) | p] 
Eloq¥ — pX) |p| + (p — po) EN | p). 


By Lemma 2.5, Elg;(qY — pX) | p] = 0 for all p, and hence, E(;N | p) = 0 
for all p and each j. (Actually, equation (8.4) yields E(¢;N | p) = 0 only for 
p * po. That E(¢;N | po) = 0 follows from continuity.) In particular, E(¢,N | p) = 
0 where again it is assumed that the points §,,--- , 8: have been ordered so 
thatm S --- Sm. 


(8.4) 


It follows from Theorem 8.6 that there exist constants 7, , --- , r; such that 
t 
(8.5) Nod) = 2 rds 
j= 
for ally ¢ B. In particular, 
t 
(8.6) nio(Bi) = d 7; $;(8;) 
j= 
fori = 1,---, t. Recalling that ¢,(8;) = 1 and ¢,(8;) = 0 forj > 7, the values 
of r,,--- , 7, are readily found from (8.6) to bem = --- = ry = 0,7, = m%. 
Thus, from (8.5), 
(8.7) N(y)oly) = nebily) 


for all y ¢ B. It follows that ¢:(vy) = 0 for all y such that N(y) ¥ n,. But, as 
argued in the proof of Theorem 8.6, ¢:(y) = 0 for all boundary points of sample 
size n, with the exception of 8, . Hence, 


(8.8) E(¢: | Pp) — $:(8,)K(B)p 9? gr”? pt 0, 


a contradiction. It follows that g¥Y — poX is not a uniformly minimum variance 
estimator. 

The next theorem shows that, despite Theorem 8.7, there do exist non-constant 
uniformly minimum variance estimators. 

THEOREM 8.8. Under the sampling plan S, there exist non-constant uniformly 
minimum variance estimators. 

Proor. Since S is bounded, there exists a boundary poirit ao = (z, 0) and a 
boundary point a, = (0, y). Neither ap nor a is one of the points 6,, --- , B; 
since if either is removed from B the resulting plan is not closed. Furthermore, 
(ao) = oj(a,) = Ofor7j = 1, --- , t since, under any of the plans S or S; , there 
is only one path to either ao or a; . Now consider an estimator f of the form 





BINOMIAL SEQUENTIAL ESTIMATION 


f(a) - fo, 
(8.9) fim) =fi, 
fy) =e fory ¢ B — {ao, a}. 


Then E(fo; | p) = cE(@;| p) = 0 for7j = 1, --- , t, and it follows from Theorem 
8.6 and the condition of Lehmann and Scheffé given at the beginning of the proof 
of Theorem 8.7 that f is a uniformly minimum variance estimator. 


9. Acknowledgments. I am deeply indebted to L. J. Savage, under whose 
careful and considerate guidance this research was done, and to 8. G. Ghurye 


and W. H. Kruskal for several helpful comments. 
REFERENCES 
[1] R. R. Banapur, ‘Sufficiency and statistical decision functions,’?’ Ann. Math. Siat., 
Vol. 25 (1954) , pp. 423-462. 
{2} R. R. Bawapour, ‘On unbiased estimates of uniformly minimum variance,’’ Sankhya, 
Vol. 18 (1957), pp. 211-224. 
[3] E. W. BaranxIn, “‘Locally best unbiased estimates,’”’ Ann. Math. Stat., Vol. 20 (1949), 
pp. 477-501. 
[4] D. BuackweE 1, ‘‘Conditional expectation and unbiased sequential estimation,’’ Ann. 
Math. Stat., Vol. 18 (1947), pp. 105-110. 
[5] R. Courant, Differential and Integral Calculus, Vol. 1, Interscience Publishers, Inc., 
New York, 1937. 
(6) W. Fe.ier, An Introduction to Probability Theory and Its Applications, Vol. 1, 2nd 
Edition, John Wiley and Sons, New York, 1957. 
[7] M. A. Grrsuick, F. Mostevier, anp L. J. Savaaes, ‘‘Unbiased estimates for certain 
binomial sampling problems with applications,’”’ Ann. Math. Stat., Vol. 17 (1946), 
pp. 13-23. 
[8] J. B. S. Haupang, “A labour-saving method of sampling,’’ Nature, Vol. 155 (1945), 
pp. 49-50. 
{9] J. B. 8. Hautpane, ‘‘On a method of estimating frequencies,’’ Biometrika, Vol. 33 
(1945), pp. 222-225. 
[10] E. L. LEHMANN AND H. Scuerré, ‘‘Completeness, similar regions and unbiased esti- 
mation—Part I,’’ Sankhya, Vol. 10 (1950), pp. 305-340. 
{11} E. L. Leumann anp C. Stern, “Completeness in the sequential case,’’ Ann. Math. 
Stat., Vol. 21 (1950), pp. 376-385. 
[12] L. J. Savages, ‘“‘A uniqueness theorem for unbiased sequential estimation,’”’ Ann. Math. 
Stat., Vol. 18 (1947), pp. 295-297. 
{13] L. J. Savaae, The Foundations of Statistics, John Wiley and Sons, New York, 1954. 
[14] C. Svein, ‘‘Unbiased estimates with minimum variance,’’ Ann. Math. Stat., Vol. 21 
(1950), pp. 406-415. 
[15] M. C. K. Tweepre, ‘‘Inverse statistical variates,’’ Nature, Vol. 155 (1945), p. 453 
[16] A. Wap, Sequential Analysis, John Wiley and Sons, New York, 1947. 
{17} J. Wo.trowrtz, ‘‘On sequential binomial estimation,’’ Ann. Math. Stat., Vol. 17 (1946), 
pp. 489-493. 
{18} J. Wotrowrtz, ‘‘The efficiency of sequential estimates and Wald’s equation for se- 
quential processes,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 215-230. 





A SINGLE-SAMPLE MULTIPLE-DECISION PROCEDURE FOR 
SELECTING THE MULTINOMIAL EVENT WHICH HAS 
THE HIGHEST PROBABILITY! 


By Ropert E. BecuHorrer, SALAH ELMAGHRABY, AND NORMAN MorsE 
Sibley School of Mechanical Engineering, Cornell University 


Summary. The problem of selecting the multinomial event which has the 
highest probability is formulated as a multiple-decision selection problem. 
Before experimentation starts the experimenter must specify two constants 
(6*, P*) which are incorporated into the requirement: ‘The probability of a 
correct selection is to be equal to or greater than P* whenever the true (but 
unknown) ratio of the largest to the second largest of the population probabilities 
is equal to or greater than 6*.” A single-sample procedure which meets the re- 
quirement is proposed. The heart of the procedure is the proper choice of N, 
the number of trials. Two methods of determining N are described: the first is 
exact and is to be used when N is small; the second is approximate and is to be 
used when N is large. Tables and sample calculations are provided. 


1. Introduction. We are concerned in this paper with the multiple-decision 
problem which arises when one attempts to answer questions such as the fol- 
lowing: 

(a) Which of the six faces of a loaded die has the largest probability of landing 
face up? 

(b) Which of the thirty-six ‘“bettable’’ numbers on an unbalanced roulette 
wheel has the largest probability associated with it? 

(c) Which of the k television programs available to a given TV audience in a 
certain locale can claim the largest proportion of the total audience as listeners? 

The multinomial distribution provides a statistical model for dealing with 
each of these questions. In the following sections it is shown how such questions 
as these can be formulated as multiple-decision selection problems. A single-sample 
procedure is proposed which provides a solution to these problems. 


2. Statistical assumptions, and definitions. Let X; = (X1;, Xo;,-°-: , Xj) 
be independent vector-observations from the same multinomial population with 
a common unknown probability vector p = (pi, Pe, +: , Px); here p,; is the prob- 
ability of the event £,(0 S p; < 1, 2 ase pi = 1) and X;; = 1 or0 according as 
E; does or does not occur on the jth observation (i = 1,2, ---,k;7 = 1, 2,---). 
Let pp) S Pe) S --: S py denote the ranked probabilities. It is assumed 





Received February 5, 1958; revised March 25, 1958. 

! This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command, under Con- 
tract No. AF 18(600)-331. Reproduction in whole or in part is permitted for any 
purpose of the United States Government. 


102 





MULTIPLE-DECISION PROCEDURE 103 


that the experimenter has no a priori knowledge which would rule out any of the 
k! possible pairings of the p,; with the £,(i = 1, 2,--- , k). Let 6.5 = pry/pry 
(¢ = 7; i, 7 = 1, 2, ---, k). Let N be the number of vector observations; let 
Yiw = doje Xi; (i = 1, 2,-+-, k), and let yaw S yw S --- S yuuw denote 
their ranked values. Let E,;;, denote the event associated with yy. Let Yrow 
be that one of the Yi which is associated with the event having probability 
pia i = 1, 2,-+*, hs 


3. Goal, specification, and requirement. We now state the experimenter’s 
goal, specification, and requirement: 

Goal. The experimenter’s goal is to select the event associated with py . 

The statistical formulation of the problem for this goal involves the true ratio 
6%.-1 = 8 (say) and the true probability P of a correct selection. It is assumed 
that before experimentation starts the experimenter can specify a pair of con- 
stants (6*, P*) with 1 < 6* < » and 1/k < P* < 1 as described below. 

Specification. The experimenter specifies: 

(a) The smallest value, 6*, of the ratio @ that is worth detecting, and 

(b) The smallest acceptable value, P*, of the probability P of achieving the 
above goal when 6 2 6*. 

The specification above is summarized in the following: 


Requirement. The experimenter requires that the procedure to be used guaran- 
tee that 


Probability {Correct selection | 6... 2 6*} 2 P*. 


That is, the probability of a correct selection is to be equal to or greater than 
P* whenever the true (but unknown) ratio of the largest to the second largest 
of the population probabilities »; is equal to or greater than 6*. 


4. Procedure. We propose a single-sample procedure that will guarantee the 
requirement. It is similar to ones described in [1], [2], and [6] which are used 
to solve selection problems involving parameters associated with other basic 
distributions. (A sequential procedure which will guarantee the same require- 
ment (above) was reported on in [3]; the theory underlying the sequential pro- 
cedure will be given in a paper with the same title as [3] which is being prepared 
by the authors of that abstract. Either the sequential procedure reported on 
in [3] or the single-sample procedure described in the present paper can also be 
used to solve problems of the type posed in [4].) 

Our single-sample procedure takes the following form: 

Procedure. 

(a) Select a random sample of N vector-observations. 

(b) Compute the yaw (¢ = 1, 2,--- , &). 

(c) If exactly s (s = 1, 2, --- , k) of the yyw are tied for largest, yj—sisw = 
Yik-e42)N = *** = Yew, Select as the event associated with py; , one of Eu—.sin , 
E.—s42n,°** » Evyw using a random device which assigns probability 1/s to 





104 R. E. BECHHOFBR, 8S. ELMAGHRABY AND N. MORSE 


each of them. In particular, if there is a single largest one, i.e., if s = 1, select 
E [KIN - 

The heart of the problem in terms of designing the experiment is the proper choice 
of N; it must be chosen just large enough to guarantee the requirement, i.e., P* must 
be achieved for all possible p = (pi, p2,--+, pe) for which & 4. 2 6. In 
order to accomplish this we consider the least favorable configuration (1.f.c.) of 
the pia’s the definition of which follows. 

We define the l.f.c. of the p,;’s as that probability vector p which for any 
given N, k, 6* minimizes the probability of a correct selection when 64-1 2 6. 
It is proved in [7] that: 

(a) The 1.f.c. is independent of NV. 

(b) The L.f.c. is given by 


(1) A: = (¢=1,2,---,k— 1). 
Since )°i_1 pj) = 1, this implies 

Pu = Pe = °° = Pay = 1/(1* + k — 1) 
(2) 


Pix = 0*/(6* + k — 1). 


It is intuitively clear that for any p for which the p; are not all equal, the prob- 
ability of a correct selection increases with N. Hence if N is chosen large enough 
to guarantee a specified probability of a correct selection when the configuration 
is least favorable, it will guarantee at least that probability for any configuration 
of pia’s with O41 = @*, i.e., it will be large enough to guarantee the require- 
ment. We shall denote by N* the smallest value of N which will guarantee the 
requirement. 

In the next section we shall show how to compute the exact probability of a 
correct selection for any k, N, and p; this method is appropriate when N is small. 
Methods of approximating these probabilities when N is not too small are given 
in sections 6 and 7. Using these methods it is possible to compute tables from 
which N* can be determined for any specification. Several such tables are given 
at the end of this paper. 


5. Exact probability of a correct selection. For any fixed k and N and any 
probability vector p with pu) < pm, the probability of a correct selection is 
given by 


1 N! 
(3) hk = $(pi) ee Pir) = > eS pil” “_— pur 

8 Yaow!+:* Yaw! 
where the summation is over all vectors yy = (ymw, Yan,*** > Yaw) such 
that yin yon = N and yww = Yon (¢ = 1, 2, --- , K — 1), and 8, which is a 


function of yw , is the number of y:~’s tied for largest. (Note: If exactly s (s = 
1, 2,---, k) of the pra are tied for largest, then ppt) = Pre-s4q) = °° = 
Pix) , and we consider the selection of any one of the associated s events as a 
correct selection.) 





MULTIPLE-DECISION PROCEDURE 105 


For fixed k and N it is straightforward (but tedious) to express ¢ in terms of 
the pra(t = 1, 2,--- , k); for example, if k = 3, N = 4 and py; > py, it is 
easy to verify that 


(4) (pm, Pin, Pi) = Pin + 4pta(1 — pyr) + 3pin(pin + 4pmpm + Pty) 
and if the pa (¢ = 1, 2, 3) are in the l.f.c. (2), this reduces to 


es wl o* ii (@*)’ *\2 * 
" 6p cp ea) - pO + 80* + 18} 


which involves only 6*. Substituting the numerical value of the specified 6* in 
(5) gives the exact probability of a correct selection for k = 3 and N = 4 when 
the pry (¢ = 1, 2, 3) are in the 1.f.c. (2). General expressions of the type of (4) 
can be obtained for arbitrary k and N, and using (2) these can be reduced to 
expressions of the type of (5) which involve only 6*. The exact probabilities of a 
correct selection which are listed in Tables A-2, A-3, and A-4 are all associated 
with the 1.f.c. and were computed using expressions of the type of (5). It should 
be emphasized, however, that it can be extremely laborious to express (3) in 
the form (5), especially when N is moderately large. For example, for k = 4 
and N = 20 the expression analogous to the term in braces in (5) is a fifteenth- 
degree polynomial in which five of the coefficients are eleven digit integers. The 
evaluation of such expressions for the specified 6* is an additional problem (al- 
though this can be done expeditiously on a high-speed electronic computer). 
It thus is obvious that some large-sample approximation for (3) would be de- 
sirable.’ We consider this problem in the next sections. 


6. Large-sample approximations to the probability of a correct selection- 
First we shall consider the case k = 3 to see what might be involved in making 
a large-sample approximation. We have 


gd = Pr{Ya@w — Yaw > 0, Yaw — Yon > 0} 
+4Pr {Yaw — Yor = , Yiw > 0} 
+4Pr {Yaw — Yow F Yow = 0} 
+ 4Pr {Y¥i~ — Yaw = 0, Yow — Yoow = 0}. 


Because of the equality signs, the last three terms become negligible for large 
N; hence, for large N we shall approximate ¢; by 


(7) ds = Pr {Yaw — Yaw 2 0, Yow — Yoow 2 9}. 


2 The referee has called our attention to an unpublished Stanford technical report, ‘‘A 
procedure for determining the loaded face of a die,’’ 1952, written by 8. G. Allen, Jr., 
which reports some results of H. Rubin and the late M. A. Girshick. They were concerned 
with obtaining a large-sample approximation for (3) when (1) holds. The approximation 
that they propose is different from ours and appears to be much more tedious to apply. 
We have not attempted to determine which approximation is the better. 








106 R. BE. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


For arbitrary k we let 


(8) w, = wx _ Yor (¢ = 1,2,--- ,k—1) 


and approximate (3) by 
(9) d& = Pr {Wi 20,W220,---, Win 2 O}. 


TABLE A-2 
Exact Probability of a Correct Selection for k = 2 and selected 6* and N when 
Pin/Pu = O 


1 

2 ‘ ‘ ‘ ‘ , 

3| .507426) .514704) .521838) .528832) .535687| .567994) .597271) .623843) .648000| .670005 
4| .507426) .514704) .521838) .528832| .535687| .567994| .597271) .623843) .648000) .670005 
5| .509282| .518378) .527290) .536022| .544575) .584759) .620903) .653381| .682560) .708788 
6| .509282) .518378) .527290) .536022) .544575| .584759) .620903) .653381) .682560| .708788 
7| .510828) .521438) .531830) .542005) .551965) .598614) .640261) .677312| .710208| .739386 
8| .510828) .521438) .531830| .542005) .551965) .598614| .640261| .677312| .710208) .739386 
9} .512181) .524114) .535798| .547232) .558417| .610637| .656910) .697670| .733432| .764734 
0} .512181) .524114) .535798) .547232| .558417| .610637| .656910| .697670| .733432| .764734 


11} .513399) .526522) .539367) .551930| .564210) .621369| .671640| .715483) .753498) .786332 
12) .513399) .526522) .539367| .551930) .564210) .621369) .671640| .715483) .753498) .786332 
13} .514515) .528729) .542636) .556230) .569509| .631125) .684913) .731359| .771156| .805076 
14) .514515) .528729) .542636) .556230) .569509) .631125) .684913) .731359| .771156| .805076 
15) .515551) .530777| .545668) .560217| .574417| .640109| .697028| .745691| .786897| .821554 


16) .515551) .530777| .545668| .560217| .574417| .640109) .697028| .745691) .786897| .821554 
17| .516523) .532697| .548509) .563949| .579009) .648462) .708193| .758754/ .801064/ .836180 
18| .516523| .532697| .548509| .563949| .579009| .648462| .708193| .758754| .801064| .836180 
19} .517440) .534509) .551189) .567468) .583336) .656286) .718558) .770748) .813908) .849257 
20) .517440) .534509) .551189| .567468) .583336| .656286) .718558) .770748) .813908) .849257 


21) .518312| .536229| .553734| .570807| .587437| .663657| .728237| .781826| .825622| .861019 
22| .518312| .536229| .553734| .570807| .587437| .663657| .728237| .781826| .825622| .861019 
23} .519143) .537871| .556160) .573989) .591342) .670635| .737319| .792107| .836357| .871648 
24) .519143 .537871| .556160| .573989) .591342| .670635| .737319| .792107| .836357| .871648 
25| .519940| .539444| .558484| .577034| .595077| .677267| .745874| .801686| .846232| .881292 


| 
| 





26; .519940) .539444| .558484| .577034| .595077| .677267| .745874| .801686| .846232| .881292 
27| .520706| .540956) .560716| .579958| .598660) .683590| .753961) .810641) .855348) .890072 
28| .520706) .540956| .560716 .579958) .598660 .683590| .753961| .810641; .855348) .890072 
29} .521445| .542413| .562867| .582773) .602107| .689639| .761626| .819036| .863787| .898087 
30) .521445 542413) -562867 582773 .602107| .689639| .761626| .819036| .863787| .898087 


























MULTIPLE-DECISION PROCEDURE 


TABLE A-2 (Continued) 


-750000) . 
-750000) . 
-843750) . 
.843750) . 
896484 


-896484) . 
-929443) . 
- 929443 

-951073] . 
-951073) . 


‘ -877915) . -928249) . ‘ -965673) . 
.838967) . -877915) . -928249| . ’ -965673) . 
.858301| . .896461) . 943483) . : 975710) . 
.858301) . 896461) . -943483) . ‘ .975710) . 
.874788) . -911768) . -955231) . ‘ -982700) . 


.874788) . -911768 -955231) . : 982700) . 
888983) . -924525) . -964376) . : -987615) . 
.888983) . - 924525 -964376) . . -987615) . 
-901294) . -935234 -971550| .980883) . .991097 
-901294| . . 935234 .971550| .980883) . -991097 





-912036) . .944277| .964509| .977209) .985219) . 993577 

.912036) . -944277| .964509) .977209) .985219) .990311|} .993577| . 
921452) .938586| .951950) .970463) .981695) .988540| .992745) .995353)1. 
.921452| .938586) .951950) .970463) .981695) .988540| .992745| .995353)1. 
-929740| .946005) .958486) .975366) .985265) .991094| .994555| .996630)1. 


.908609| .929740) .946005) .958486] .975366| .985265) .991094| .994555) .996630)1. 
.916740| .937058| .952450) .964073) .979418) .988116| .993066| .995904| .997549)1. 
.916740| .937058| .952450) .964073) .979418] .988116| .993066) .995904/ .997549)1. 
924053] .943538) .958069) .968861) .982776] .990399) .994590) .996914/ .998216/1. 


.924053} .943538) .958069) .968861 982776) - 990399) -994590] .996914) .998216)1 .000000 


6.1 The normal approximation. In the least favorable configuration the 
chance variables W; (i = 1, 2, --- , k — 1) have a(k — 1)-variate distribution 
with 


e* — 1 


(10a) BW) = ea 


(10b) Var {W,} = oe 


, . : (k + 1)0* — 1 
3 { - : SR rg 
(10c) Cov {W iM i} ve ; D? 





108 R. E. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


(k+ie*-1 | 
(k + 2)6* + ( — 2)’ 


it is to be noted that the W; have a common mean and a common variance, and 
all pairs (W;, W;) with i + j have a common correlation. The standardized 
variables obtained by subtracting the common mean from each W; and dividing 
the differences by the common standard deviation have zero mean, unit variance 
and correlation (10d). As N approaches infinity it can be shown that their joint 


(10d) Corr {W;, W;} = 
ini 


TABLE A-3 
Exact Probability of a Correct Selection for k = 3 and selected 6* and N when 
Pias/Pin = Pwa/Pu = 


6° 





.337748] .342105) .346405) .350649| .354839) .375000| .393939) .411765 











-428571| .444444 
-428571| .444444 
-463557| .485597 
-483549] .508459 
-496400! .523701 


1 
2} .337748) .342105) .346405) .350649| .354839| .375000) .393939) .411765 
3| .339230) .345067) .350845) .356563) .362223| .389648) .415644) .440261 
4) .340211) .347015) .353746| .360404) .366988] .398804) .428798) .457023 
5) .340708| .348015) .355254) .362424) .369524| .403954| .436571| .467376 
-341530| .349659) .357717| .365703) .373614| .412000) .448349] .482601) .514760) .544870 
-530497| .562902 
-540988) .575098 
-555226| .591360 
.567859| .605664 


6 

7| .342261) .351116) .359896) .368599) .377220) .419041) .458579| .495737 
8| .342690) .351980) .361198) .370341| .379405) .423441) .465140) .504341 
9} .343331) .353264) .363125) .372909| .382612| .429767| .474385) .516236 
0 


1.02 1.04 1.06 1.08 1.10 1.20 1.30 1.40 1.50 1.60 
343924) .354448) .364899) .375270) .385556| .435526| .482730) .526881 


12) .344852| .356312) .367703|) .379016) .390244| .444851|) .496410) .544485) .588874| .629544 
13} .345363) .357335| .369237| .381060| .392794) .449841| .503615) .553613) .599602) .641549 
14) .345718) .358049) .370316) .382506| .394608) .453486] .508992) .560545) .607866/ .650901 
15) .346193) .359004| .371751) .384422) .397003] .458204) .515819| .569185) .617984| .662159 

-346649| .359916) .373121) .386247| .399282| .462662) .522228) .577239) .627350| .672506 
-634822| .680846 
-643725| .690617 
.652051| .699693 
-658865| .707190 


16 

17| .346978| .360581) .374125) .387593) .400971) .466050) .527195| .583582 
18) .347408| .361444) .375422) .389326| .403137| .470311) .533327| .591273 
19) .347822) .362275) .376671) .390991) .405217| .474375| .539137) .598513 
20) .348131| .362899| .377614| .392256} .406805) .477552) .543766| .604368 


348525) .363692) .378808) .393850) .408798) .481465) .549362) .611321 
.348908} .364459) .379962) .395391| .410723) .485220) .554699) .617912 
-349200} .365050| .380855} .396589| .412226| .488221| .559043) .623353 
-349566| .365787| .381966] .398073) .414083) .491856| .564207) .629708 
349923) .366505) .383045) .399514) .415882) .495360) .569158) .635764 


-666818) .715798 
-674310) .723854 
-680564| .730636 
.687746| .738300 
-694548] .745512 


. 350202) .367067) .383896) .400655) .417315) .498211| .573255) .640847| .700315| .751678 
-350546| .367759) .384939) .402049) .419059) .501616) .578062) .646703) .706852| .758554 
350882) .368435| .385956) .403407) .420755) .504912| .582687| .652307| .713070| .765054 
351147) .368973) .386770) .404499) .422126) .507632) .586569) .657072| .718410) .770679 
.351472) .369627| .387756| .405818) .423776| .510844) .591073) .662503| .724397| .776883 


11) .344311) .355228) .366075) .376845) .387531) .439500) .488625) .534549) .577108) .616279 








SSRN RKESSE 














MULTIPLE-DECISION PROCEDURE 


TABLE A-3 (Continued) 


1.70 J oa 2.00 2.20 2.40 2.60 
| .459459| . .487179| .500000| .523810| .545455, .565217] . 
| 450459) . .487179| .500000| .523810) .545455) .565217| . ‘ 
| 506446) . -544834) .562500| .595076| 624343, .650694) . .696000| . 
.531844) . .574400| .593750) .629013| .660201) .687858) . . 734400) . 
| .549348) . .596006| .617188) .655677| .689539) .719367] . .768960) . 


| 573002 . 623707] .646484 .687420| .722879| .753615| .780302, .803520, . 
| .593033) .620996| .646908) .670898| .713621] . .781411) .808186| .831168) . 
.606741| .636019| .663056| .687988| .732096] . -801050} .827817| .850522 
624709) .655391| .383552| .700351| .754530| . .823677| .849902| .871811 


| .640379| -672142) .701122| .727509| .773277| . -841960) .867454) .888455) . 





-652138| .684827| .714528) .741447| .787793) . -856218) .881142| .901414| . 
-666581) .700148) .730459| .757750| .804259| . .871496| .895425) .914596) . 
| .679553) .713808) .744556| .772069) .818512| . .884350| .907277| .925387) .96 
| .689754) .724627| .755785) .783525| .829974) . -894690) .916778) .933995) . 
| .701829) .737229| .768662| .796463) .842547| .878110) .905440) .926415| .942521 





| -712849) .748649) .780247) .808021) .853620} .888334| .914642| .934550] .949621 
| .721803 757985} - 789763) .817546) .862776) .896789| .922234) .941231) .955417 
| .732124| .768579| .800397| .828033| .872575| .905597| .929943| .947853] .961031| . 
| -741646) .778285) .810073| .837510| .881309] .913341| .936627| .953517| .965769) . 

. 749569} .786405) - 818200) -845490| .888677| .919865) .942239| .958246) .969696 





.758521| .795439| .827106| .854109| .896418] .926542) .947841) .962856) .973441| . 
.766844| .803783| .835277| .861964| .903378] .932464) .952742) .966836' .976630| . 
.773897| .810887] .842256| .868687| .909339 937525! .956912! .970200| .979306| . 
.781741| .818669| .849790| .875841| .915513) .942633| .961019| .973439] 981825) . 
.789077| .825899| .856745) .882401) .921100| .947194) .964638 .976255| .983987| .{ 























-795385| .832141| .862765, .888089| .925944| .951138) .967749| .978658| .985817\1. 
-802311| .838895) .869184! .894068) .930902| .955073| .970782) .980949| .98752411. 
-808819} .845199| .875139| .899580| .935412| .958605! .973468) .982950 .988996'1.000000 
-814481| .850706| .880351) .904409| .939360| .961685| .975795| .984672| .990250 1.000000 
-820632| .856601) .885853) .909437) .943363] .964731| .978045, .986299| .991412'1.000000 


21 

22) 
23) 
24) 
25| 
26) 
27) 
28 
30 


distribution approaches a (k — 1)-variate normal distribution. Thus, for large 
N the probability of a correct selection in the least favorable configuration can 
be approximated by the volume under a (k — 1)-variate normal surface. How- 
ever for k 2 4 such volumes are tabulated (see Table 1 of [1], Table Al of [5], 
and [9]) only for the particular correlation matrix {p;;} where 


1 ifi=j 
(11) Di = 3 sae ee 

\3 if i # j. 
Although the matrix of Corr {W; , W;} is of this form, the correlation coefficient 
when 7 ¥ j is not 4; in fact, as k approaches infinity this correlation coefficient 
approaches 6*/(6* + 1). Because of the lack of appropriate tables this approach 











110 R. E. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


was abandoned. The approach described in the next section yielded the desired 
results. 


6.2. The arcsin transformation and the normal approximation. 
6.2.1. Derivation of formulae for the approximation. 


We next consider the chance variables 


TABLE A-4 
Exact Probability of a Correct Selection for k = 4 and selected 6* and N when 
Pui/Pm = Pa/Pa = Pa/Po = * 


6° 








1.02 1.04 1.06 | 1.08 | 1.10 1.20 


wali 


. 253731] .257426| 261084 -264706| .268293) .285714 





am. i smi ae 1.60 











1 | | | 302326) .318182) .333333| .347826 
2.258781] .257426) .261084) .264706) 268293) .285714) .302326 .318182| . 333333) .347826 
3] .254673] .250318) .263935, .268522) .273081| .205432) .317041) .337904) .358025| .37414 
4| .255611} 261192) .266744) .272264) .277752| .304688) .330730| .355833| .379973| .403143 
5] .256139| .262250| .268331) 274382) .280401| .309077) .338609| .366224) .302775) .418245 
} 
6| 256583] .263145| .269686, .276203) .282604 disnl 345828) .375948| .404969| .432836 
7| .257172| .264331| .271472, .278595| .285604| .320770 .354949| .388029| .419872| .450388 
8| .257687| .265365| .273030, .280677| .288304| .326017) .362783| .308342| .432516| .465190 
9] .258099| 266195) .274282| .282357) .200415| .330318| .360274| .406967| .443171| .477740 
0} .258492| .266989| .275485) .283975| .202454| .334516| .375660| .415493) .453734| .490198 


11} .258928| .267868) .276814) .285758) .294695| .339075| .382512) .424536) .464814) .503121 
12} .259327) .268673) .278029, .287388) .296744| .343240) -388764)| -432773| .474882| .514835 
13] .259681) .269389) .279113) .288846) .298580) -347007| - 394460) -440317| .484138) .525629 
14) .260022| .270078) .280158) .290253) .300355) .350670) .400013| .447681/ .493169| .536145 


15} .260383| .270808| .281264 .291741| .302228) 354499) -405770| .455249) .502373) .546771 


| i 
16) .260724| .271497| .282308, .293144) 303995) . 358115, .411204) .462384) .511035) .556749 
17} .261038| .272133|} .283272 .294443) 305634 361490) .416297| .469092) .519189) .566146 
18| .261340) .272747| .284205, .295701 307222, 364774) -421263, 475631 -527129) -575274 
19} .261657| .273387 -285177) -297011, .308874) -368164) - 426353, .482287| .535152) 
20} .261959) .274001) .286109| .298266| .310457| .371415) .431231, .488657| .542817| .593158 
| | | | | | 





21) .262243) .274578| .286985, . 299447) .311949) .374495| .435869) .494723) .550117| .601462 
22) .262519| .275137| .287836| .300597| .313402 .377504| 440402) -500650) .557238) .609543 
23) .262804) .275716) . 288716) .301784) .314900) .380587) 445020) 506648) -564399) .617614 
24| .263079| .276276| .289567| .302931| .316349| .383569) -449483) .512437| .571294) .625364 
25) .263341) .276807| .290375| .304023| .317730) .386423| .453764) .517995) .577911| .632792 


26| .263595| .277324| .201164| .305000| .319079| .389219| .457960| .523439 .584380, 640033 
7| .263857| .277857| .291975| .306185| .320463| .392071| .462216| .528926| .590860| .647242 
28| .264112| .278375| .202763, .307249| .321808| .394844| .466352| .534250, .597133| .654198 
29| .264355| .278870| .293518| .308270| .323100| .397516| .470344| .539390| .603185| .660898 
30| .264592| .279355| .294257| .309271| .324366, .400140) 474265) .544434) .609112| .667440 

















MULTIPLE DECISION PROCEDURE 


TABLE A-4 (Continued) 


2.00 2.20 


-375000| .387755) . -423077| . 
-361702) .375000) .387755) . .423077| . 
.396088| .414063) .431359) . .479404| . 
-425352) .446615) .466955) . .522732) . 
.442630| .465942) .488203) . .548979) . 


| .459521| .485016) .509329) . .575418) . 
.479526| .507268) .533616) . 604569) . 
-496297| .525812) .553736) . .628297| . 
| .510588} .541676) .571001) . -648751) . 
| .524771 - 557400) .588076| .616825) .668762) . 


.539326| .573366) .605234; .634964| .688278) . 
-552481) .587752| .620642) .651193| .705602) . 
564618) .601028 -634853) -666144) .721496) . 
.576412) .613886) .648563) .680503) .736604) . 
. 588232) .626666) .662079) .694543) .751145) . 


-599300| .638596) .674651| .707554) .764506) . 
-609715| .649804) .686436) .719716) .776908) . 
-619802| .660620) .697760) .731348) .788646) . 
.629849| .671313) .708875) .742683) .799919) . 
630802 .681436| .719356| .753325) .810407| . 


648459] .691030) .729261) .763347| .820203) . 
.657254| .700300} .738788] .772941) .829482| . 
.665978| .709434| .748111] .782265| .838375| . 
.674328| .718142| .756962} .791078| .846700) . 
682313) .726446| .765375) .799421) .854511 





-690070| .734481| .773477| .807418) .861918) . 
.697743| .742376) .781387| .815172) .869005) . 
.705121| .749939) .788930) .822533) .875664) . 
-712210| .757181| .796128) .829528) .881935) . 
.719106| .764196 -803068) -836237| .887883 


(12) Z; = 2 arcsin 4/ %e — 2 arcsin / 


and for arbitrary k we shall write (9) as 


(13) d = Pr{Z, 20,2, 2 0,---,Ze41 2 O}. 














The chance variables Z; (¢ = 1, 2, --- , k — 1) also havea (k — 1)-variate dis- 
tribution, with means, variances, and covariances which can be expressed as 
power series in 1/N. In the 1.f.c. these power series involve only k, N and 6. 
They are found in the following way. We expand 2 aresin (Y(ow/N)"’, consid- 





112 R. E. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


ered as a function of Y,../N, in a Taylor series around the point Yioy/N = 
Pia Obtaining 


2 arcsin 4/ Tg = f(pia) + f' (pia) (Zo - pia) 
1 ” Y é F 1 wn Yio , 
(14) + 5f (pia) (Ze - pia) + gf (pia) (Zo - pa) 


(*) 
N 
1 iv i DN F 1 
+ mal (pia) (Fe = pa) +0 (+) 





where 
f(pia) = 2 arcsin VY pra 
eins: pune ais 
V pial — pra) 
(15) tn) = — 2a. 


2V pial — pra) 


wn _ Spia — Spin + 3 
f (pra) = aa 
pia(l — pra) 


~ 3(2p1a — 1)(8pia — 8piq + 5 
f"(pia) = (2p i Pia Pus + ) 
8V pin(l — pra) 


Now E£{Z;}, Var {Z;}, and Cov {Z;, Z;} all can be expressed in terms of E{2 
aresin (Y(jw/N)""}, E{(2 aresin (Y~pw/N)'??} and E{[2 aresin (Y¥.ox/N)"”] 
[2 aresin (Y,j~/N)""]}, and these latter involve the moments of multinomially 
distributed variables which are well known. Thus, for example, 


i 7 ” gl — i 
B{2 arcsin VY ciw/N v f(pa) + f (pra) a 
m pal — pal — 2pia) 
+ f (pta) a ae 
2 2 2 . 
"(piq) | PA = Pea) 4 Pra — Pra Bia) — Spa + Orta) 
+ f (pra) | SN? + D4N8 


+ 0(1/N*) (Gj = 1,2,---,k—1) 


which immediately yields E{Z;} up to terms of order 1/N*. 
In the 1.f.c. we use (2) and obtain after simplifications 


(16) 


(17a) E{Z,} = a,+ * + O(1/N’) 
(@§ = 1,2,---,k—1) 


(17b) Var{Z,) = 4 rs 4 O(1/N’) 


N 





MULTIPLE-DECISION PROCEDURE 


. 6” . / 1 
2 —————————— sadam rumen 
arcsin / FOE 2 arcsin LEC I 


_ *+k-3 e*—k+1 
4\/o* +k — 2 i 1)6* 


~ eth gh eee 
«k ner +k—2) 
(0* + k — 1)*(ko* + k — 2) — o*(k — o* — 1)(0* + k — 3) 
4/o*(k — 1)(0* + k — 2)! 


3(0* + k — 1)*(ke* + k — 2) 
80*(k — 1)(0* + k — 2) 


bh = 


+ —1. 


In addition, 


(19) Corr {Z;,Z;} = 


tx) 


1 c 1 
42 gy ery +k—2) - eect $+ 9(%) 


be 1 
242 4/——aeEESD EPID +%+0(z) 


where 
_ +k 1)*(ko* + k — 2) + 0*(6* — k + 1)(0* + k — 3) 
CORE — IO + DP 
_ 2(6* + k — 1)°(6* + k — 2) — (6% + k — 3)’ 
8(6* + k — 2)8 ; 


(20) 


The following table shows how (19) varies with k and 6* when N is large. (In 
these computations, N in (19) is assumed to be infinite.) 








114 R. E. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


It is seen from (19) with N = + (and from the above table) that as 6* ap- 
proaches unity, (19) approaches 4 for any k; also, as k approaches infinity, (19) 
approaches 4 for any 6*. 

As with the W; , when (1) holds the Z; have a common mean and a common 
variance, and all pairs (Z; , Z;) with i + 7 have a common correlation. Also as 
N approaches infinity it can be shown that the joint distribution of the stand- 
ardized variables approaches a (k — 1)-variate normal distribution with zero 
means, unit variances, and a correlation matrix of the form of (11); and the com- 
mon correlation coefficient (19) for i # j is approximately 4. Thus, the tables in 
[1], [5], and [9] can be used for finding the volume under the (k — 1)-variate 
normal surface, and hence an approximation to the probability of a correct selec- 
tion can be obtained. 


6.2.2. The approximation. To obtain the approximation we let 


(21a) A=a+2 
and 
_ by be 
(21b) B= 7 de Wi 
Then 
(22) bem TL oo [oltste + sted ddl dns, 
VB VB VB 
where g(t, t2,°-:*, te-1) is the (kK — 1)-variate normal density function with 


zero means, unit variances, and correlation matrix (11). Since [9] gives P(A) 
as a function of A where 


(23) PAa)= [ [+ [ola tey ++ te) dtd, ++, dts, 
= i ae 


we obtain 


(24) —* (4 / 3, 


6.2.3. Tables based on the approximation. Some computations were made 
to indicate the goodness of the approximation. These approximate probabilities 
are listed along with the corresponding exact probabilities (which were extracted 
from Tables A-2, A-3, and A-4) in Tables B-2, B-3, and B-4. They were com- 
puted as follows: The quantities a; , a2, b, , and b, were computed for each k 
and 6* using (18a), (18b), (18c), and (18d), respectively; then A and B were 
computed for each N using (21a) and (21b), respectively; then A = A(2/B)'” 





MULTIPLE-DECISION PROCEDURE 115 


TABLE B-2 
Exact’ and Approximate’ Probability of a Correct Selection for k = 2 when 
Pii/Pm = 


N= 30 


-52163 
-521445 
-60293 
-602107 
86481 
863787 
- 96836 
‘ -968861 
‘ , ‘ -99781 
| . 998216 
-99973 . 99998 : 1.00000 


J bg ! 1.000000 


2.00 
3.00 
10.00 


‘ Exact probabilities are given to six decimal places; approximate probabilities are 
given to five decimal places. 


TABLE B-3 
Ezact' and Approximate’ Probability of a Correct Selection for k = 3 when 
Piai/Pin = Pn/Pm = &* 


e N =15 N=2 N = 25 


1.02 ‘ . 34593 .34789 . 34963 
.346193 | .348131 .349923 
1.10 ‘ .39577 .40571 .41451 
; | 397003 -406805 .415882 
1.50 ’ .61374 .65541 .69061 
, | ,617984 .658865 .694548 
2.00 ; . 79077 . 84134 . 87854 
| 796463 845490 882401 
3.00 s .93681 | .96644 .98196 
. .942521 -969696 | - 983987 
10.00 ‘ .99958 .99995 | 1.00000 
| .999881 | .999991 | .999999 
1 Exact probabilities are given to six decimal places; approximate probabilities are 
given to five decimal places. 











was computed; then [9] was entered with A and k, and P,(A) was read out. 
(See section 6.2.4 for a description of the tables in [1], [5], and [9].) Comparison 
of the approximate and exact probabilities in Tables B-2, B-3, and B-4 indicates 
that the approximation is excellent even when N is only moderately large. Of 


course, the approximation breaks down if 6* is large and k is small for then (19) 
differs too much from }. 





116 R. E. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


TABLE B-4 
Exact’ and Approximate’ Probability of a Correct Selection for k = 4 when 
Pa/Pa = Pa/Pa = Pa/Pum = * 








6 geon-:| gos -| fon.) woe N= 30 
ata tirl eBid latent eee eel 

1.02 | .25812 | = .25907 26154 | .26292 26315 
.258492 | .260383 .261959 | 263341 264592 

1.10 | 20066 | 30022 ‘3080 | 31567 32282 
.292454 | .302228 | .310457 | .317730 | 324366 

1.50 | .44645 | .40462 | 53529 | 57078 «=| «(60235 
| .453734 | .502373 542817 | 577911 .609112 

2.00 | .60566 68421 | .74450 | 79203 | .83003 
| 616825 | .694543 753325 | .799421 | .836237 

3.00 | .80045 88202 | .92043 | 95751 .97430 
| .814697 | .891625 | .935605 | 961441 | .976808 

10.00 | .98975 | .99878 | .99988 | 1.00000 | 1.00000 
| .996024 | .999622 | .999963 | -999997 | — 1.000000 


' Exact probabilities are given to six decimal places; approximate probabilities are 
given to five decimal places. 


The problem of including a correction for continuity in the approximation 
for small N was considered, but the authors were not successful in finding one 
which would give uniform improvements over the approximation finally adopted. 


6.2.4. Reference tables. Since [9], Table Al of [5], and Table 1 of [1] employ 
different notations, some comments about these tables might be appropriate. 

In [9], P(A) is tabulated as a function of A for various k. In the netation of 
these tables, A = x and P,(A) = P(1, k) which is a function of A. The tabula- 
tions are for A = 0.00 (0.01) 6.09 and k = 2(1)10. These tables were originally 
computed for the purpose of preparing Table 1 of [1]. 

In Table 1 of [1], Ais tabulated as a function of k and P,(A). In the notation 
of these tables, A = N’”), and the columns to be entered are those headed t = 1. 
The tabulations are for P,(A) = 0.10 (0.05) 0.80 (0.02) 0.90 (0.01) 0.99, 0.9950, 
0.9990, 0.9995 and k = 2(1)10. 

In Table A1 of [5], A is tabulated as a function of k and P;(A). In the notation 
of these tables A = u.(n), k = n+ 1, and P,(A) = 1 — a@. The tabulations 
are for P,(A) = 0.75, 0.90, 0.95, 0.975, 0.99 and k = 2(1)51. 


7. Choice of N to meet the requirement. We now shall show how to deter- 
mine N*, the smallest N which will guarantee the requirement. To guarantee 
the requirement in the least favorable configuration we must have 


(25) [ [ + f g(t, , te, +++ , tes) dt; dt--- dh. = P*. 
A A Ln A 
VJB VB VB 


Hence, N must be chosen large enough to make 





MULTIPLE-DECISION PROCEDURE 


A a; + a,/N A 
yp VB Vb/N+b/N° V2 


where A is determined from [1], [5], or [9] to satisfy the equation 
(27) P,(A) = P*. 


We consider the inequality in (26) and note that the middle expression ap- 
proaches infinity as N grows large. Therefore, since a; is positive there exists 
(for any a; , a; , b; , be , and A) a smallest integer, which we denote by N*, with 
the property that the inequality (26) holds. When N is large, the terms in (26) 
involving a; and b, can be ignored, and N* is approximately the smallest integer 
equal to or greater than 


A’b, 
(28) Dat * 

A tendency for the approximation (22) to underestimate the exact value of 
(3) (except for k = 2 when N is even) is evident from examination of Tables 
B-1, B-2, and B-3. Hence, one usually can expect the value of N* obtained by 
the methods described above to err on the conservative side, i.e., to be somewhat 
larger than the exact value of N* which is required. 


8. Numerical example. Suppose that one were interested in selecting that 
one of the thirty-six bettable numbers on an unbalanced roulette wheel which 
has the largest probability associated with it. Suppose further that he specifies 
that if this probability is at least 10% larger than the second-largest probability, 
he wishes to make a correct selection with probability at least 0.90. Then we 
have k = 36, 6* = 1.10, and P* = 0.90; the least favorable configuration is 


1 
Sm = -°* = a = 36.1 = 0.02770083 


(29) 


1.1 
Pix) = a5 = 0.03047091. 


We can anticipate here that N* will be large, and hence we need compute only 
(30a) a = 2 aresiny/0.03047091 — 2 arcsiny/0,02770083 = 0.0164885 
and 

(30b) b) = 2 + 24/1.1/35(85.1) = 2.05985 . 


(Note: A comprehensive set of tables of arcsin z is found in [8].) Table Al of 
[5] (with n = 35, a = 0.10 in the notation of that table) yields A = 3.5351. 
(From the table in section 6.2.1 we see that (19) is approximately 0.501, and we 
would expect the approximation to be a good one.) Using (28) we compute N* 
as the smallest integer equal to or greater than (3.5351)"(2.05985) /2(0.0164885)° 
= 47,341.7. Thus in order to guarantee the requirement, one must take at least 








118 R. E. BECHHOFER, 8. ELMAGHRABY AND N. MORSE 


47,342 observations, i.e., 47,342 spins of the wheel. (Of course, the last digits in 
N are not accurate, but they were retained to indicate the method.) 

For illustrative purposes we have computed the following table which gives 
N* for selected values of k and P* when 6* = 1.1 is specified. 


Prenay 

2 3 | 6 6 
0.75 | 201 | | 2,475 | 30,775 
0.90 «| 724 | 1,618 | 4,698 | 47,342 
0.95 | 1,193 | 2,339 | 6,383 | 59,080 
= | oe | 4255 | 10,303 84,952 





9. Generalizations. Thus far in this paper we have considered only Goal 1: 
“To select the event associated with pa; .”” However, the same approach can be 
used in connection with different goals, or more general goals. For example, we 
might consider Goal 2: ‘‘To select the event associated with py) ,” or Goal 3: 
“To select the events associated with pu_:+2) , --: , Pu) without regard to order” 
for any 1 S ¢ S k — 1. Clearly, Goals 1 and 2 are special cases of Goal 3 since 
the selection of the k — 1 largest is equivalent to the selection of the one smallest. 
Table Al of [5], Table 1 of [1], and [9] all provide the constants necessary to deal 
with Goal 2, but only the two latter tables provide the constants necessary to 
deal with Goal 3. 


10. Acknowledgment. The authors are indebted to Mr. Richard Lesser, 
Director of the Cornell Computing Center, who supervised the computation 
of the entries in the various tables. 


REFERENCES 


{1] R. E. Becnuorer, “‘A single-sample multiple-decision procedure for ranking means of 
normal populations with known variances,’’? Ann. Math. Stat., Vol. 25 (1954), 
pp. 16-39. 

{2] R. E. BecuHorer anp M. Soset, ‘‘A single-sample multiple-decision procedure for 
ranking variances of normal populations,’’ Ann. Math. Stat., Vol. 25 (1954), 
pp. 273-289. 

{3] R. E. Becouorer anv M. Sosgt, ‘‘A sequential multiple-decision procedure for select - 
ing the multinomial event with the largest probability (preliminary report) ,’’ 
Abstract, Ann. Math. Stat., Vol. 27 (1956), p. 861. 

{4] R. E. Becnuorer anv M. Sosgt, ‘Non-parametric multiple-decision procedures for 
selecting that one of k populations which has the highest probability of yield- 
ing the largest observation (preliminary report),”’ Abstract, Ann. Math. Stat., 
Vol. 29 (1958), p. 325. 

(5) 8. 8S. Gupra, ‘“‘On a decision rule for a problem in ranking means,’’ Institute of Statis- 
tics Mimeograph Series No. 150, University of North Carolina, May 1956. 

(6] M. J. Hoverr anp M. Sosst, ‘‘Selecting the best one of several binomial populations,”’ 
The Bell System Technical Journal, Vol. 36 (1957), pp. 537-576. 

(7} H. Kesten anv N. Morsz, “A property of the multinomial distribution,’ Ann. 
Math. Stat., Vol. 30 (1959), pp. 120-127. 





MULTIPLE-DECISION PROCEDURE 119 


[8] Mathematical Tables Project, U. 8. National Bureau of Standards, Table of Arcsin zx, 
Columbia University Press, New York, 1945. 

[9] D. Tz1curow, Probabilities associated with order statistics in samples from two normal 
populations with equal variance, Chemical Corps Engineering Agency, Army 
Chemical Center, Maryland, December 1955. 





A PROPERTY OF THE MULTINOMIAL DISTRIBUTION 
By Harry Kersten! anp NorMAN Morse? 


Cornell University 


1. Introduction. The purpose of this paper is to prove a property of the 
multinomial distribution which is fundamental to the choice of sample size for 
the selection procedure described in the preceding paper [1] of this issue. As in 
[1], we let pn; S pa) S --- S Py) denote the ranked multinomial probabilities 
and let d = $(pm) , --* , Pm) be the probability of a correct selection when the 
selection procedure in section 4 of [1] is used. We wish to prove that for any 
integers N = 1, k = 2 and for any number 1 < 6* < ~, ¢ is minimized among 
all configurations with pu; 2 6* pry (¢ = 1, --- , k — 1) by the configuration 
Pu = Pa = *** = Pua = Pm /O* = 1/(6* + k — 1). This configuration is 
called “least favorable’ because of this property. 

The theorem on least favorable configurations, proved below in section 3, 
merely assembles the results of the preceding lemmas in a rather obvious way. 
The main ideas of the result are contained in the two lemmas of section 2. Note 
that Lemma 1, proved below for k 2 3, is not needed to prove the theorem for 
the case k = 2. 

We should like to thank Dr. Milton Sobel for his valuable comments and 
suggestions. 


2. The lemmas. It was found convenient to deviate in the following respect 
from the notation used in [1]. Let p, denote the largest of the k probabilities, but 
let the other (k — 1) probabilities p; , --- , p.-1 be unranked among themselves 
Let E; be the event associated with p; (i = 1, --- , k), and let yiw be the number 
of occurrences of event E; after N observations (i = 1, --- , k); of course we 
have 0 S yw S N (é = 1,---, k) and Dons yin = N. (For notational con- 
venience we use the same symbol for a chance variable and its observed value.) 
In correspondence to the notation just given, the probability ¢, of correctly 
choosing event E; is in this paper a function of the p,’s as defined above rather 
than a function of the p,,;’s as it is in [1]. It is sufficient to restrict our attention 
to configurations with p, 2 @* max (pi, --- , Pes). 

Suppose k 2 3 and any k — 2 of the p,’s including p, , are held fixed. For 
notational convenience, we take the two unfixed probabilities to be p, and pz, 
and we arbitrarily call p, the one which is equal to or greater than the other, so 
that we have p; = p,. Note that since the sum (p; + pz) is fixed, ¢ can be 


Received February 5, 1958. 

' Research supported by the Office of Naval Research. 

* Research supported by the United States Air Force through the Air Force Office of 
Scientific Research of the Air Research and Development Command, under Contract No. 


AF 18(600) -331. Reproduction in whole or in part is permitted for any purpose of the United 
States Government. 


120 





MULTINOMIAL DISTRIBUTION 121 


regarded as a function of p, only, say ¥:(p:), with (p+ pm:)/2 Sm S 
(pi + pr). 


Lemma 1. Let k 2 3, and let (p: + pa), Ds, -*- , De be fixed as above. Then 
¥i(p1) is a non-increasing function of p, , where pr, lies in the interval (p, + ps)/2 S 
Pi S (p: + 72). [Hence ¢, is minimized over all vectors with fixed p;, --- , Ps 
and such that p, 2 6* max (p:, «~~ , Pe-1) by taking p; = min (p; + pro , px /6*).) 

Proor or Lemma 1. ¢; can be rewritten 


1 N! 
8 Wy! Ysn!, +s ie 


oo = 2. 
(1) 


qe ph” niet pie 


vk 

- (vr) ren) — r)”*S(yiw » Yan » Yen) 
vin=wn—vew \Yin 

where wy = Yaw + Yow, 9 = (Pi + Pz), F = pr/Q; the outer summation is over 

all (k — 1)-vectors (wy, ysw,-*-*, Yew) such that wy + > iusyw = N and 

Yew = Yin (t = 3, --- ,& — 1); (8 — 1) is the number of yiw’s (i = 3, --- , k — 1) 

which equal yw ; and S(yw , Yew , Yew) is defined as follows: 


= 1 if max (yw, Yaw) < Yew 
(2) S(yw , yaw, yew) = 8/(8 + 1) if max (yw, Yow) = Yew and Yw ¥ Yow 
s/(s + 2) if yw = Yow = Yen 
The inner summation in (1) insures that yzw 2 yin for i = 1, 2. Note that s is 
actually a function of (yaw , --- , Yew). 
To prove Lemma 1 it is sufficient to prove the monotonicity property for the 


inside sum in (1) with wy and yzy fixed, for then ¢, will be termwise monotonic. 
Using (2), it is easy to see that for wy and yew fixed the inside sum in (1) is equal 
to either 

Pr {wy — Yaw < tan < Yaw} + ai Pr {yw = Wy — Yaw} 


1 
(3) * 


+ TI Pr {yw = Yen} 


for the case wy < 2yen , Or 


8 
(4) s+2 Pr {yw = Yow = Yen} 
for the case wy = 2y.n. Since the inner sum is empty for wy > 2yw, no 
other cases have to be considered. 
Now for wy S 2a, define the function 


(5a) fe) =F 5 ons ae 


r=wN—a x\(ww a x)! ; 


which we rewrite as the difference of incomplete Beta functions: 





122 HARRY KESTEN AND NORMAN MORSE 
Wn ! , 1 1 
- iat a we wy-o-l we 4 @ 
(5b) f(a;r) ws os ml [x*(1 — 2) x (1 — 2)’ dz. 


For the case wy > 2a, we take f(a; r) to be zero. (Also individual terms in (5a) 
are zero if x < 0 or x > wy .) The proof of Lemma 1 can now be completed by 
noting that expression (3) equals 


(6) 4 


and that expression (4) equals 


S(yew - 1; r) + pi Suni), 


(7) glue 1). 


Since the integrand in (5b) is non-negative for the case wy S 2a + 1 and$ Ss 
r & 1, it follows that f(a; r) is a non-increasing function of r. The expressions 
(6) and (7) are therefore non-increasing functions of r, and this proves the lemma. 

Starting with any configuration for which p, = 6* p; (¢ = 1,---, k — 1), 
one can, by repeated application of Lemma 1 arrive at a new configuration which 
has (k — g) of the p; equal to zero, (g — 2) of the p; equal to p,/é@*, and the one 
remaining p; in the closed interval [0, p,/6*]. Moreover, the probability of a 
correct selection for this configuration is at most as large as that for the original 
configuration. The purpose of the second lemma is to permit further reduction 
of the probability of a correct selection by starting with this new configuration 
and making changes of a second type. 

To be more precise, we assume that 


6*(1 — pros) 
e*+g9-—2 


O S Pe-o+1 S Pe-ot2 = Dr /O* 


Pr = Open = +++ = O* Pe y42 = 


(8) 


Py = =m =0, 

where g is an integer 2 < g < k, 0* > 1,p;2>0(¢=1,--- ,k), and 5.1 p; = 1. 
Note that under these conditions 0 S pri: S 1/(6* + g — 1). Let peous = p 
to simplify notation. It follows from (8) that ¢, may be regarded asa function of 
p only, say y2(p). 

Lemma 2. Under the conditions (8), ¥2(p) is a non-increasing function of p, 
where p lies in the intervalO S p S 1/(6* + g — 1). [Hence ¢ is minimized under 
(8) by the configuration py = --- = pro = 0, Pron = Prox? = °° = Pei = 
p./6* = 1/(6* + g — 1).] 


Proor or Lemma 2. Consider a multinomial problem in which only the g — 1 
events E,_,42, --- , Ex are involved; suppose the probabilities of these events 
are given by 1/(6* + g — 2), --- , 1/(6* + g — 2), 6*/(@* + g — 2), respec- 
tively. Let M observations be taken, and consider the quantities y_,42,u, 

- , Yew . But now let an integer 0 S c S N be given which is to correspond to 





MULTINOMIAL DISTRIBUTION 123 


the event E,_,4: , and use the selection procedure of [1] to choose among the g 
events E,,4:, --: , Ey . That is, the constant c plays the same role in the selec- 
tion procedure that the chance quantity yx_,4:,% ordinarily would; otherwise the 
selection is made in the usual way. Let Qu(c) be the probability of choosing 
event E, when this procedure is used. 

¢ can now be rewritten as 


(9) t= nl) = > de pL — p)"*Qn-o(y). 


Note that Qy_,(y) is independent of p, so that ¥2(p) is merely a linear combina- 
tion of binomial terms. One may rewrite (9) as 


(10) Yale) = 3 [Qr) — Qr-vnsty — 0) () ora - wy", 


where we set Qwii(— 1) = 0. Since aon (x)p"(1 — p)*-™ increases with p for 
all 0 < y S N (ie., the binomial distribution shifts to the right if the prob- 
ability of a “success” increases), it clearly suffices for proving the lemma to 
show that Qy_,(y) is non-increasing in y. 

From the definition of Qx(c), it is clear that 


(11) Qvy-aly + 1) Ss Qv-y-r(y). 


In order to prove Qv,a(y + 1) S Qv—(y), therefore, one needs only the in- 
equality Qv..(y) S Qv-,(y). Hence it is sufficient for proving the lemma to 
prove the following 

Assertion. Let 0 S c S N be given; then 


(12) Qu(c) S Qurilc) 


for all integers M. 

To prove the assertion, note that Q.(c) is the probability that EZ, is chosen 
after M observations; and that if an (M + 1)st observation were taken, the vector 
(Ye-o42.m4i, °** » Yk,w4i) Could lead to any one of g possible decisions. Likewise, 
Q4:(c) is the probability under the same selection procedure that EF, is chosen 
after (M + 1) observations, and this event might have arisen from any of a 
number of sequences, some of which would have led to E; , some of which would 
have led to E,y1,--- , some of which would haved led to Ey_,4:, had the 
selection been made after the Mth observation. In short, for fixed c, let R(i, 7) = 
Pr {choose E; after M observations and choose E; after (M + 1) observations}, 
where 1,7 = k —g +1,---,k. Then we have 


k 


Qusilc) = 2 R(j, k), 
juk—g+1 

(13) a 

Qulc) = 2, Rk, 3); 


jak—o+ 





124 HARRY KESTEN AND NORMAN MORSE 


also 


k 


(14) Qusile) — Qulc) = a a [R(j, k) — R(k, jp. 


jmk—g 


Thus it suffices for proving Lemma 2 to show that each of the terms 


In the following, the subscript M is dropped from the cumulative sums yy; 
it is understood that the symbols y; (¢ = k — g + 1, --- ,k) stand for cumulative 
sums after M observations. Write m = max (c, yroi2,°°* , Ye). m is said to 
occur s times if exactly s of the numbers c, y:-,42, «++ , Ye are equal to m and the 
other y;,’s are less than m. The detailed expressions below for R(j, k) and R(k, j) 
apply only forj7 = k — g + 2,---,k — 1. It is, of course, necessary to show 
R(k — 9 + 1,k) — R(k, k — g + 1) 2 0 as well, however we do not write out 
the expressions here. The argument for this case is much the same as is set out 
below for the other j, and is in fact somewhat simpler. In the selection procedure 
which defines Qu4:(c), the only way one can choose E;,_,4; after having chosen 
E, on the previous observation is through the randomization process. Thus the 
last two parts of R(k, k — g + 1) become zero. This also has the effect of strength- 
ening the desired inequality. 

One has, then, forj = k — g9 + 2,---,k—1, 

g 


R(j, k) = oP {yi = Ye = m;m occurs s times; y; < m — 1 for 


s=) 
yi * m; E; occurs at the (M + 1)st observation, for 


some i such that y; ¥ m} 


1 
s(s + 1) 


=m—l1;yi< m—1ify; S m— 1 and 


+ Pr {ys = yx = m;™m occurs s times; yi, = --* = Yi, 


i #i,°++,%, ; Ey occurs at the (M + 1)st observa- 


tion for some h = 1, --- , u} 


+ Pr {y; = m = y + 1; m occurs s times; EF; occurs 


nil 
s(s + 1) 
at the (M + 1)st observation} 


+ : Pr {y; = ye = m;m occurs s times; E, occurs at the 
8 


(M + 1)st observation} | 





MULTINOMIAL DISTRIBUTION 
Similarly, 


g 
; 1 , 
R(k,j) = % SP {ys = Ye = m;m occurs s times; ys < m — 1 for 


e=l 


yi ~ m; E; occurs at the (M + 1)st observation, for 
some 7 such that y; # m} 


edie . 
+ et) Pr {y; = yx = m;m occurs s times; y;, = --- = yi, 


=m—l;y:i<m—1ify; Ss m—1land 
i #t,-:++,%. 3B, occurs at the (M + 1)st observa- 
tion for some h = 1, +--+ ,u} 


1 
s(s + 1) 
at the (M + 1) observation} 


Pr {y, = m = y; + 1; m occurs s times; Z; occurs 


+ ~Pr {y; = Ye = m;m occurs s times; EF; occurs at the 


(M + 1)st observation} | 


Each of the probabilities R(j, k) and R(k, 7) has been divided into four parts, and 
the parts are now considered in turn. The first two parts of (15) are identical with 
the first two parts of (16), hence these parts contribute nothing to R(j, k) — 
R(k, j). As for the third parts, a typical term from the third part of (15) may be 
written 


l M! m— —m 
(17) (6*p:) ‘pt rs 0* nr , 


s(s + 1) (m!)*(m — 1)! yi! +++ Yin! 
where p, = 1/(6* + g — 2). Similarly a corresponding term from the third part 
of (16) may be found merely by switching the role of k and 7; this term may be 
written 
1 M! m M—m 
18 — a Pi; 
(18) s(s + 1) (m)*(m — 1)! yn! Yip Om)" pr™ «Ps 
which is equal to (17). Since the terms in the third parts of (15) and (16) can be 
put into one-one correspondence, with the corresponding terms being equal, the 
third parts contribute nothing to R(j, k) — R(k, j). One only has to consider the 
fourth parts of (15) and (16) therefore. 
A typical term from the fourth part of (15) may be written 


l M! m —m 
= . (6*p1)"pi'” - O* pi , 


19 Oe 
(19) (m!)*yi.!,°** ,Yigus ! 





126 HARRY KESTEN AND NORMAN MORSE 

while the corresponding term from the fourth part of (16) may be written 
1 M! m M—m 

20 - ——_____________—__._ (6* ‘Tr. 

(20) 8 Pi)"pi + Pr 


Expression (19) is at least as large as expression (20), since 6* > 1. Thus the 
fourth part of (15) is at least as large as the fourth part of (16) and we have 
R(j, k) — R(k, 7) = 0. The lemma then follows. 


3. Proof of the property of the least favorable configuration. 
TueEoreM. Let 1 < 6* < © be given, and suppose p, = 6*p; (i = 1, --- ,k — 1). 
Then 


1 1 6° 
(21) som 26S EET PEST) 


Proor or THEOREM. Let a vector of probabilities (p: , --- , p.) be given which 
satisfies the hypotheses. By applying Lemma 1 repeatedly, one can produce a 


configuration (p1 "he Pr) such that (pi GaP es Pr) Ss (Pi ; errs Pr), and 
which has the following properties: 


(a) Pe = Ps 
(b) pi = p,/6* for h — 2 of the p; 


(22) ‘ ' 
(c) pi = 0 for k — h of the p; 


(d) 0 sp: S P/O for the one remaining p; 


where his an integer 2 Sh Sk. If Di = p,/6* for the 7 in (d) of (22), then the 
D: can be renumbered so ) that t the configuration is in the form of (24), and we can 
proceed immediately to apply Lemma 2 with g = A + 1. If that p: < p/ 0, 
however, we can without loss of generality, renumber the Di for (¢ < k) and 
rewrite (22) as follows: 


(a) Pr Pr 
, 


(b) De-a = Pea = *** = Penge = Dr /O* 
(23) k 
(c) Pras =l1- ¥. Pi < p,/6* 
imh—h+2 


(d) Pi-rh = Praa=*' =p, =0. 


One can now apply Lemma 2 to the p; by substituting h for the g of that 
lemma. The result is a new configuration (p; , --- , pz) which yields a probability 
of correct selection at most as large as the one corresponding to (pi at? tie Pi), 
and which has 

(a) De O* Di a O* pe —-a41 
(24) ” ” ” 
(b) = Pr-r-1 = *"* Pi 





MULTINOMIAL DISTRIBUTION 127 


Another application of Lemma 2, with g = A + 1, to this new configuration will 
yield still another configuration whose probability of correct selection is again 
not larger than the one corresponding to (p; , --- , pz), and for which property 
(24) holds with h replaced by h + 1. 

Repeated applications of Lemma 2 lead to a configuration where (24) holds 
with h = k, and this configuration certainly yields ¢, < $(pi, --- , Pex) since the 
probability of correct selection did not increase at each step. But (24) with 
h = k is the configuration whose probability of correct selection is given in the 
right side of (21). Hence the theorem is proved. 


REFERENCE 


{1] R. E. Becuuorer, 8. ELMaGHRABY, AND N. Morss, ‘‘A single-sample multiple-decision 
. procedure for selecting the multinomial event which has the highest probability,” 
Ann. Math. Stat., Vol. 30 (1959), pp. 102-119. 





A CLASSIFICATION PROBLEM INVOLVING MULTINOMIALS' 


By Oscar WESLER 
University of Michigan 


1. Introduction. The problem of the k-faced die. There are several distinct 
classification problems involving multinomials, each known as the problem of the 
k-faced die. For example, the problem may be to decide whether the die is loaded, 
and indeed there are several versions of this. One, say, in which the loading is 
specified; another, in which the unknown loading is estimated from one sample, 
and then a decision is made as to whether a second sample came from the loaded 
die or an honest one [1]. Or the problem may be to determine which of the k 
faces carries a known extra load [2]. Although distinct, all these problems are 
somewhat related and have certain features more or less in common. The version 
we shall treat in this paper is still another and a general one of its kind. It will 
be convenient for our purposes to state it formally as a fixed sample size two- 
decision statistical problem considered within the framework of composite hy- 
potheses. We shall use the notation and terminology of [3}. 

Let the space 2 of nature’s pure strategies consist of two subspaces Q, and 
Q2 , Q consisting of the k! states got by permuting a known probability dis- 
tribution p = (pi, pe, -*: , Pe) over the faces 1, 2,--- , k of a k-faced die, Q2 
consisting similarly of the k! states arising from a known distribution q = 
(qi , G2, °** » Qe). We assume the p; and q; strictly positive, and shall further as- 
sume, without loss of generality, the vectors p and q written so that p; 2 po 2 
+ > Mm, andqg 2 @2--: 2 %. The statistician wishes to make one deci- 
sion if w ¢Q, (the null hypothesis), and another decision if w ¢Q, (the alternative hy- 
pothesis), the decision to be made on the basis of a sample of N observations x = 
(a1, °-* , tw), or rather on the basis of the sufficient statistic r = (mi, --- , re) 
representing the number of times each of the k faces appears. Let @ be a ran- 
domized statistical decision procedure such that if r is observed, the null hypothe- 
sis is accepted with probability ¢(r) and the alternative hypothesis is accepted 
with probability 1 — ¢(r). Let & and 8, the probabilities of the two kinds of errors, 
be the two functions given by 


a(w\¢) = 1 — > ¢(r)P(r\w), we 


Blw!¢) = > o(r)P(r| a), w ¢ 


Received May 12, 195s. 

1 Part of this research was carried out under contract with the Office of Naval Research 
while the author was at Stanford University, and completed at the University of Michigan 
with the partial support of the United States Army Signal Air Defense Engineering Agency, 
Contract No. DA-36-039 SC-64627. 


128 





MULTINOMIAL CLASSIFICATION 


and let 
a(¢) = max &(w|¢), 
w AQ, 


6(¢) = max A(w| ¢). 
weQy 


Then the problem we wish to consider is: Of all procedures ¢ satisfying the 
condition a(@) = ao, to find that procedure ¢* which minimizes 8(¢). 

In this paper a game-theoretic minimax method is used to find an entire class 
of these desired procedures ¢*, optimal in this extended Neyman-Pearson sense. 
A simplification of the result is-then given for the asymptotic case of large N. 
Finally, for illustrative purposes, a look is taken at the special binomial case of 
k = 2, a case deserving of special attention in its own right. We shall begin 
with a brief description of the minimax approach, referring the reader to Section 
7.7 of [3] for detailed proofs. 


2. The minimax method. This method of solving the problem can be under- 
stood in terms of a simple geometric picture. Consider the a, 8-set in Euclidean 
2-space given by S = {(a(@), B(¢)):¢ € }, where ® is the class of all randomized 
strategies ¢. S is a point set in the unit square of the first quadrant and contains 
the diagonal on which a + 8 = 1. Although S is not necessarily convex (unlike 
the case of a simple hypothesis against a simple alternative) it can be shown that 
the subset 7 of S lying on or below the above-mentioned diagonal is convex, and 
that moreover, in the present case, the points on the lower boundary of 7, a 
strictly decreasing convex curve joining the point (0, 1) to the point (1, 0), 
belong to 7. Clearly, it is these points on the lower boundary of 7 which provide 
the solution to the statistical problem posed above: Given ap , we are interested 
in that procedure ¢* which determines the point (a(¢*), 8(¢*)) on the lower bound- 
ary of T whose first coordinate is equal to a). Our problem, then, is to charac- 
terize the class of test procedures ¢* which sweep out the entire lower boundary 
of T. 

Now let G, be the statistical game that arises when the losses are taken to be 
1 for an error of the first kind, a constant u > 0 for an error of the second kind, 
and zero otherwise. It is readily seen from our picture that a minimax solution 
¢. for G, determines that point on the lower boundary of T for which alg.) = 
uB(oa). It follows at once, therefore, that, by varying u between 0 and ~, the class 
of corresponding minimax solutions ¢. will generate the desired lower boundary of T 


3. The general solution. The minimax method requires that we solve the 
statistical game G,, for arbitrary u > 0. The problem is clearly invariant under 
the symmetric group of permutations on k letters. We shall therefore set about 
finding the class of (unique) invariant minimax procedures which will sweep 
out the lower boundary of 7.” For fixed g, with 0 < g < 1, let d, be the a priori 
probability distribution over 2 which selects 2; with probability g, Q with prob- 

2? For the specific problem under consideration, it is very easily shown that only invariant 
procedures need be considered in minimizing 8(@). In more general situations, see [5]. 





130 OSCAR WESLER 


ability 1 — g, and which is uniform within both Q, and Q, i.e. A,(w) = g/k! 
for we and A,(w) = (1 — g) / k! for weQ,. Define the distribution function 
p(r) over the set of all possible outcomes r = (r;, «++ , r¢) of the N observations 
by the formula, 


N! Tl "2 Tk 
oer eT 
Looe get 


where the summation is taken over all permutations (7; , --- , 7%) of the indices 
1, -»+ , k. In other words, p(r) is the average of all the multinomial distributions 
over the r’s arising from the probability vector p = (p; , --+ , px). p(r) is clearly 
symmetric in r, that is, p(r) = p(ri, r2, °°: , Te) = PUT, Tig, *** » Te). Simi- 
larly, define the symmetric distribution g(r) as the average of the multinomial 
distributions determined by the probability vector g = (q:,--- qx). The Bayes 
procedure against \, is readily computed in the usual way to be: 


fa) g 
" p) ~ a — ou 


bear) =40 if 2 
p(r) 
g(r) _ 
p(r) 
where ¢ is any fixed number satisfying 0 S ¢ < 1. We note that the procedure 
¢-,: is a symmetric function of r and depends on g and u only through c. Further, 
for any choice of ¢ and c, we see that any pair of numbers go, uo leading to c 
will yield the same procedure ¢.,, , which will be Bayes against \,, in the game 
G,,,. Moreover, c varies between 0 and © as g and wu vary respectively between 
0 and 1, and 0 and «. Varying c between 0 and ~, and ¢ between 0 and 1, the 
symmetric partitions of the set of all possible r’s into acceptance and rejection 
regions defined by the inequalities in (1) are clearly such that the acceptance 
regions vary monotonically from the empty set to the whole space, so that the 
corresponding a’s vary from 1 to 0. By an obvious continuity argument, there- 
fore, given any a, with 0 S ap S 1, we can by suitably choosing ¢ find a ¢ 
such that a(¢...) = ao. Now, consider any one such procedure ¢,,;. From the 
symmetry of ¢,,; it follows that its associated risk function p(w, ¢.,,) is constant 
over Q; and constant over 2: , equal to a(¢,,,) and uB(¢,,:) respectively. Selecting 
uy 80 that a(¢.,:) = woB(be,2) and then go so that go/(1 — go)uo = c, we see that 
the procedure ¢.,; is Bayes against ,, and has a constant risk function over Q, 
hence ¢.,; is a minimax procedure for the game G,,,, and determines the point 
(a(d-.1), B(dc.1)) on the lower boundary of T. By continuity, therefore, the class of 
invariant procedures $.,, withO Sc < ©,0 St S 1, determines the entire lower 
boundary of T and provides the extended Neyman-Pearson solution to our classi- 
fication problem. Finally, since any minimax invariant procedure ¢* for a game 
G,,, must be Bayes against the corresponding \,, , and since the only such invari- 


= cg,u) =c¢ 





MULTINOMIAL CLASSIFICATION 131 


ant Bayes procedures are of the form ¢,,; , this proves the uniqueness of the class 
of invariant minimax procedures. 

It might perhaps be useful for certain purposes to give the class of invariant 
partitions or procedures the following somewhat simpler geometric description. 
Let = be the fundamental probability simplex in k-space, consisting of all vectors 
&= (&,-°--, &) with & = 0, 2 sue &; = 1, and let 6; = r;/N, so that 6 = 
(6,,--* , &) eZ. By the symmetry of the problem and its solution, we shall 
assume without loss of generality that r, 2 --- 2 ry , so that the observed prob- 
ability vector 6 as well as the two given vectors p and q all belong to the sub- 
simplex =’ of = defined by & = & 2 --- = &. Then, given ay, the desired op- 
timal procedure ¢,,, may be characterized in the first place by the simple parti- 
tion of =’ into two regions of 6’s (the acceptance region and rejection region, 
with possible mixing on the boundary) given by the defining inequalities specified 
in (1); the symmetric images of this partition under all k! permutations then give 
the corresponding symmetric partition of = which constitutes the whole proce- 
dure @c,: . 


4. The case of large \. The kaleidoscopic procedures. For large N, an ap- 
proximation suitable for most purposes may be given which materially simplifies 
the class of optimal partitions. 

Confining our attention to =’, the inequality g(r)/p(r) < c¢ in (1) may be 
written, on factoring out the largest term from both numerator and denomi- 


nator, as 
| n 1 e be on 5, 7N z 
YG) THE 
Pi Pe Pe 1+ 2 


where, for large N, hence only approximately for large r; , re, --- , 7% , both = 
and =. may be taken as zero, or as certain positive integers less than k!, depend- 
ing on the number of equalities existing among the given components q; of q¢ 
and among the p; of p. In any event, taking logarithms, we may then regard the 
optimal partitions of =’ as approximately determined by the one parameter 
family of hyperplanes 


(2) 6; log Sid (b an arbitrary constant). 
Di 

That is, for arbitrary b and t,0 S ¢ S 1, the optimal procedure ¢., amounts to 

placing the points 6 of =’ on the side of the hyperplane where 


6; log 4 <b 
a Di 
into the acceptance region, the points on the other side into the rejection region, 
those on the hyperplane mixed in the proportion ¢, | — ¢, and then taking the 
symmetric images of this partition to get the symmetric partition of = as a whole. 
In view of our geometric description, we shall call these approximating @» , 
the kaleidoscopic procedures. 





132 OSCAR WESLER 


To see at once the reason for this rather striking language, we invite the reader 
to consider the case of k = 3, and to draw a few simple figures, with the points 
doré = (&, &, &) of Z represented in barycentric coordinates by a point in an 
equilateral triangle with unit altitude, the distance from the point to the three 
sides being the values £, , & and &. By selecting different pairs of points for p 
and q so as to get separating lines with different slopes, and by varying 6 so as 
to translate these lines in a parallel fashion, the reason for the term kaleidoscopic 
will appear before his eyes. 

In general, by a suitable choice of b and of the mixing factor t, the symmetric 
or kaleidoscopic test can be made to give us any desired ay with approximately 
the minimum 8. 


5. The case of k = 2: The binomial case. We specialize the previous discus- 
sion to k = 2 as deserving of special attention, and also to provide a simple 
illustration of the discussion and results. We refer the reader to problem 7.7.3 
of [3] and to pages 75-76 of [4] for a particular example. 

We are now in the binomial case and shall speak of coins rather than dice. 
Two probability distributions, p = (p: , pe) and gq = (qm, gq), are given. Under 
the coin falls heads with probability either ~ or pp = 1 — p,., under & 
heads with probability either q; or g. = 1 — qm. Assume things written so 
that 1 > p: > qm 2 1/2. Letr = (1, N — mr) represent, respectively, the num- 
ber of heads and tails appearing among the N tosses of the coin. Then p(r), 
given by 


p(r) = (") pi'ps ” + pip” 
1 


_ 


and g(r) given similarly, are symmetric bimodal distributions over the number 
r, of heads, r, = 0, 1, --- , N. The likelihood ratio q(r)/p(r) which determines 
the optimal procedures in (1) may be written 


gM te + ate _ (ae) (a? + a) 


pir) pips + ps'pt * — \ pipe prety gs 


Without bothering to convert r to 6, the kaleidoscopic procedures given by (2) 
are immediately seen to be two-sided symmetric tests of the following form: 
Take action 1 (decide for p) if r, <j or 1 > N — j, action 2 (decide for q) 
if 7 <r, < N — j, and mix actions 1 and 2 in the proportion ¢, | — ¢ if 1 = 7 
or N — j, where j is allowed to run over the integers 0, 1, --- , [V/2] and t 
is any number satisfying 0 < ¢ S 1. Denote this two-sided symmetric test by 
(j, t). Then the kaleidoscopic procedures are given by the set of all pairs (j, ¢) 
with j = 0,1, --- , [N/2!,0 < t Ss 1. Note that if p; < q the procedures are of 
course the same, except that the actions taken are reversed. 

The interesting feature here, which we shall prove, is that the optimal proce- 
dures of (1) and the (j, t) of (2) are the same, in exact agreement for every N, 
so that there is no question here of approximations or of loss of power in the ap- 
proximation. To see this, let us agree because of symmetry to consider the like- 
lihood ratio (3) for only those outcomes r = (r;, N — r;) for which 27; = N. 





MULTINOMIAL CLASSIFICATION 


Then the assertion amounts to showing that the inequality g(r)/p(r) < c¢ 
for any one such r implies the same inequality for all r with a larger r; . Suppose, 
therefore, g(r)/p(r) < c. Since q:ig2 > pipe , the first factor in the factored form 
of (3) decreases with increasing 7; . If we can show that the second factor decreases 
as well with increasing 7, , the assertion will be proved. We require the following 
simple lemma. 

Lemma: The function y = [x"*' + (1 — 2)"™|/[x™ + (1 — 2)”] considered 
on the unit interval 0 < x < 1, with m an arbitrary non-negative integer, is de- 
creasing for 0 < x S 1/2 and (by symmetry) increasing for 1/2 S x < 1. 

Proor: Taking derivatives, a straightforward calculation will show that 1’ 
has the same sign as [2” — (1 — 2)’"| + maz™ “(1 — 2)” [2 — (1 — 2)’. 
Clearly, y/ S 0 for 0 < x S 1/2, hence the lemma. 

Applying the lemma for successive values of m and multiplying functions to- 
gether, there follows at once the more general fact that the same decreasing 
property over 0 < x S 1/2 and increasing property over 1/2 S$ zx < 1 is ex- 
hibited by the function 
2” + (1 — x)” 
for arbitrary integers n > m 2 0. From our assumption of 1/2 S m < pi < 1 
there follows the inequality 


pees 


gta . pit pe 
gate ptt pr 
or, equivalently, the inequality 
‘i qi + @ 
< m m 

Pr + P2 
for arbitrary integers n > m 2 0. But this says that the second factor of (3) 
decreases with increasing r,;. The asserted agreement between the optimal and 
kaleidoscopic procedures is thus proved. 

It should be pointed out that in practice, given ao , it becomes a simple matter 
in this case, using tables of the binomial distribution Py(r; | p:), to calculate the 
desired test (j, t) from the condition a(j,t) = ao. The minimized 8, namely 8(j, ¢), 
is then read off directly from the tables of Py(r: | q:). 


REFERENCES 


{1} Herman Cuernorr, “‘A classification problem,’’ 7'echnical Report 33, Applied Mathe 
matics and Statistics Laboratory, Stanford University, 1956. 

{2} 8. G. Augen, Jr., “‘A procedure for determining the loaded face of a die,’’ Technical 
Report 13, Applied Mathematics and Statistics Laboratory, Stanford University, 
1952. 

{3} Davip BLacKWELL AND M. A. Girsnick, Theory of Games and Statistical Decisions, 
John Wiley and Sons, New York, 1954, pp. 200-207 in particular. 

[4] Oscar Wes ER, Solutions to Problems in Blackwell and Girshick’s Theory of Games and 
Statistical Decisions, John Wiley and Sons, New York, 1954. 

(5) Oscar Wester, “Invariance theory and a modified minimax principle,’”’ Ann. Math. 
Stat., Vol. 30 (1959), pp. 1-20. 





ON THE LIMITING DISTRIBUTION OF THE NUMBER OF 
COINCIDENCES CONCERNING TELEPHONE TRAFFIC 


By L. TaxAcs 
Hungarian Academy of Sciences, Budapest 


1. Introduction. Let us consider a telephone exchange. Suppose that the 
subscribers make calls at the instants 7, t2,-°-:,t,,°°:, Where tT, — 
Tr-1 (n = 1, 2,-++ 3 7» = O) are identically distributed independent positive 
random variables with the distribution function F(x). Put ¢(s) = ste” dF(z), 
a = fo 2dF(z) and o = ff (x — a)’ dF(z). 

Suppose that there is an infinite number of fully available channels and that 
each call gives rise to a connection (conversation) on one of the free channels. 
Denote by x, the duration of the holding time beginning in the instant 
t,(n = 1,2, --~-). It is assumed that x, (n = 1,2, ---) are identically distributed 
mutually independent positive random variables, which are independent also of 
the random variables 7, (n = 1, 2, ---). Suppose that P{y, < x} = 1 —e™, if 
z = 0. 

We say that the system is in state E, (k = 0, 1, 2, ---) if k channels are busy. 

In what follows we shall deal with the determination of the distribution of the 
number of transitions LE, — Ey4, (k = 0, 1, 2, ---) occurring in the time interval 
(0, t] and the corresponding asymptotic distribution as t > «. 

The above problem was solved earlier by the author [7] in the particular case 
when {7,} forms a Poisson process with density \. 


2. Notation. Denote by n(¢) the number of busy channels at the instant ¢ 
and put 


(y(t) = k} = P.(d), 
Define the r-th binomial moment of n(t) as follows: 


Bi) => (‘\rwwo, 


k=r 


and put 


gle) ‘a [ eo B.(t) dt, (R(s) > 0). 
0 


Further denote by v{” the number of transitions E, > Ex4,, (k = 0, 1, 2, ---), 
occurring in the time interval (0, ¢]. (We say that a transition E_, — EK» takes 
place at t = 0.) Denote by m,(t) the expectation of the random variable ". 
(m_,(t) = lift = 0 and m_,(t) = Oif t < 0.) 

Finally denote by m(t) the expectation of the number of calls taking place in 


Received June 20, 1958; revised October 24, 1958. 
134 





COINCIDENCES IN TELEPHONE TRAFFIC 


the time interval (0, ¢t]. Then 
m(t) = au F(t), 


where F(t) denotes the n-th iterated convolution of F(t) with itself. Clearly, 


© 
I e*‘dm(t) = _ 9s) y 
0 1 — ¢(s) 

3. The solution of the problem. If specifically {7,} forms a Poisson process 
with density A, then {»(t)} is a Markov process. In other cases {(t)} ceases to be 
a Markov process, but the instants 7, always form the Markov points (or 
regeneration points) of the process. Accordingly, for fixed k (k = 0, 1, 2, ---), 
the instants of the successive transitions E, — E,4,; form a recurrent (or renewal) 
process. 

Denote by R,(x) the distribution function of the distance between two con- 
secutive transitions E, — E,.,, and by Ri (x) the distribution function of the 
distance between the first transition Z, — E,4, and the zero point. Knowing 
Ri(x) and Ri(x), the distribution function of » can be determined easily ; 
namely, we have 


(R(s) > O). 


(1) (vi > n} = Rie Re--- o RO, 


where the right hand side contains the n-th iterated convolution of R,(t). 
Define 


| x dR, (x) 
0 


o. = [ (a — p,)* dR, (2). 


If ct < «, then we have 


k z 
(4) lim P [vs — a < 2 = - Ra I ew 
tow ont? J V 29 Lx 
as it is well known in renewal theory. (Cf. W. Feller [1], W. L. Smith [4], and the 
author [5].) 

If we consider other initial conditions than 7(0) = 0, then we obtain similar 
results. In particular, the limiting distribution (4) is independent of the initial 
condition. 

Thus, the problem is reduced to the determination of the distribution functions 
R(x) and Rp (x). We need some auxiliary theorems, which will be proved below. 


dy, 


4. The Palm functions. Hitherto we have not made any restrictions con- 
cerning the servicing of the calls. Now, following C. Palm [3], let us suppose that 





136 L. TAKACS 


the channels are numbered by 1, 2,---, r,---, and that an incoming call 
realizes a connection through that idle channel which has the lowest serial 
number. This assumption does not restrict the generality since {n(t)} is inde- 


pendent of the system of the handling of traffic. Now denote by 7}”, 73”, --- 
(r) 


- the instants of the calls which find all channels busy in the group 

-, r), leaving the other channels out of consideration. Obviously the 
time differences 7‘), — 7 (n = 1, 2,---) are identically distributed inde- 
pendent positive random ves, Let us denote by G,(x) their common dis- 
tribution function. We shall prove the following 


THEOREM |. Define 


(5) vs) = | dG,(a), (r = 0,1,2, ++); 


then we have 


> (*) TT = ve +) 
(6) v(8) = SUE 
eet Vi = 1—v@e+ in)’ 


i=0 (8 + iu) 
where the empty product is 1 and Go(x) = F(z). 
Proor. C. Palm [3] has proved that the distribution functions G,(zr) 
(r = 1, 2, ---) satisfy the following system of integral equations: 


(7) Ga) = Galax) - I (1 — e™”)(1—G,(a — y)) dG,(y), (r = 1,2,---), 


pe b ( 
where G(x) = ee This can be proved easily. Let us suppose that r,”’ 
rs” (where 7S = 7,,). Then conditionally 


(r) (r) (r—1) (r—1) 
P{ria ~~ ae s x | Tm+1 ~~ Tm = y} 


_ je + (l—e™)G,@ — y), fOsys 
: 0 ify > 2 


and by the theorem of total probability we have 
Pirvti— rh <2} = G,(z) = I [eo + (1 — &™)G,(@ — y)) dG ly), 
0 
which proves (7). 


Taking the Laplace-Stieltjes transform of (7) we obtain Palm’s recurrence 
formula, 





1(8 + 7) 
8 (8 mid 
®) YTS e ae’ 


where yo(s) = ¢(s). 





COINCIDENCES IN TELEPHONE TRAFFIC 


If we define 
(9) ai 2 (") l 1 — o(s + tu) 
re jo \JJ imo p(s +i) ’ 
then it is easy to see that 
on 
(10) Dysa(s) = Des) + =2© De + 0), 
(8) 
Further, a simple calculation shows that the function, 


D,(s) 
il (3) = ——-., (r= 0, 1,2, --- 
(11) vl) = 5) r 
satisfies (8) and yo(s) = ¢(s). This proves (6). 

5. The binomial moments B,(t). We shall prove the following: 

THEOREM 2. The binomial moments B,{t), (r = 0, 1, 2, ---) exist for all 
t = 0, and we have 


(12) B,(s) = [ ” “pia = 1 _f{-ve+ 
0 8+ rp ino 1 — o(s + tp) 
if R(s) > O. 
Proor. Introduce the generating function 
(13) G(t,z) = Do Pde". 
k=0 
G(t, z) satisfies the following integral equation: 


(144) Gz) =[1-— F@) + [ G(t — 2, z)[1 — (1 — ze ** dF(z). 


This can be proved as follows. Define 


( i . . = 
fit, u) =<" f#Ostsu, 


|0, otherwise. 
Then 
n(t) - aX f(t —™ Tn, Xn). 


Now let us suppose conditionally that 7, = x; then 


_ fft—2u)+at—2), ifest, 
io ={ 0 , ifz>t, 


where 7(¢ — x) is independent of f(t — z, x) and has the same distribution as 
n(t — x). Here the generating function of f(t — x, x1) is [1 — e*“” + 26", 
if0 < x S t, and the generating function of 9(t — x) isG(t — z,z),if0 S x St. 





138 L. TAKACS 


Therefore, applying the theorem of total expectation for G(t, z) = E{z"°}, we 
obtain 


G(t, z) = [1 — F()) + [ G(t — 2, 2)f1 — &*” + ze” dF(z), 
0 


which proves (14). 

I am indebted to R. Syski for calling my attention to the possibility of the 
above proof. Applying the results of R. Fortet [2] or the author [6], R. Syski 
showed that G(t, z) satisfies the following integral equation: 


(15) G(t,z) =1-—(—- 2) | Git — 2, ze” dm(z), 
0 


where m(t) denotes the expected number of. the calls occurring in the time in- 
terval (0, ¢]. 
Since 


(16) B(t) = +, (ee *) (r = 0,1,2,--- 
r! dz’ zl 


we obtain from (14) that 


t t 
(17) Bt) = [ B. — x) dF(x) + [ Bult — xe" PdF(x), = (r = 1,2, 
0 0 


This is a linear integral equation of the Volterra type for the unknown B,(t). 
As is well known, the solution is 


(18) B,(t) = | a ie ded, 
0 


This can be obtained immediately from Syski’s equation (15). 
Taking the Laplace transform of (18), we obtain the following functional 
equation: 


(19) B(s) = _ (8) _ B,1(8 + yp). 

1 — ¢(s) 
Since Bo(t) = 1, consequently Bo(s) = 1/8, and applying repeatedly formula 
(19) we finally obtain 


r—1 . 
Bild tia nie _ (8 + tu) 
: 8 + rp ico 1 — o(s + tp)’ 
as was to be proved. 
It is to be remarked that there exists a constant C so that 


vr 


(20) B,(t) < ¢ 


rt? 


for all ¢ 2 0. This can be proved by virtue of (18) 





COINCIDENCES IN TELEPHONE TRAFFIC 


Remark. Since 


(21) Be) = %(*) Pw 


=F 


and B,(t) < C" /r!, (r = 0, 1, 2, ---+), we obtain easily that 


(22) P,(t) = rts vy (F) Bio. 


r=k 


Hence, specifically, 


a8 = r—l 
oy [rine Zw G) ae eioet el 


6. The transitions FE, — F,,,. 


THEOREM 3. If m,(t) denotes the expectation of the number of transitions ER, — 
Ex.4; occurring in the time interval (0, t|, then we have 


a4) [amy =X (-0* (7) TT eet, (k = 0,1,2, --: 


Proor. Applying the theorem of total probability we can write 
oo . t 
(25) P(t) = > (2) [ eet — et — P(t — u)] dmja(u). 
j=k 0 


This follows from the fact that the event that there is a state EZ, at the instant ¢ 
can occur in several mutually exclusive ways: the last transition in the time 
interval (0, ¢] is E;.. > BE; (j = k, k + 1, ---) and this transition takes place at 
the instant u(O S u S 2), and in the time interval (u, ¢] there does not occur any 
new call, but 7 — k conversations terminate. 

Hence, 


(26) B,(t) = > (‘) P,(t) = a () eet — F(t — w)] dmjs(u), 
0 


=r j=r 


a FO PINES OTN soil: Whines — 
where (‘) (;,) = @) (;, a has been used. Forming the Laplace transform of 


(26), we have 
y= et SAL mat 
B(s) = Sa x r) b dm;_,(t). 


Now by the aid of (12) we obtain 


¥ (2) [eames = 5 , owe 


j=r — o(s + rp + rp) ind 1 — o(s + tp) 


Multiplying both sides of this formula by (— v™(7) and summing over r = 





140 L. TAKACS 


l,l + 1, --- we obtain 


[ e** dm,-i(t) 
0 


ont (7) it an IE ete : 


i-o 1 — g(s + ip) 0 1 — o(s + tp) 


(27) 





If we write 1 = k + 1, then 
¢ Pca ree - _ ayet g(s + iy) + iu) 
(28) l e*' dm, (t) = 2 ( 1) G yt ‘Cr ES" 


which was to be proved. 


7. The distributions R,(x) and Ri (z). 
THEOREM 4. We have 


(29) [ e** dRi(z) = [> “4 ¥ ') invest 1 — o(s + =f 


~— mb 9(8 + tn) 
and 
C S+1 Py pa 2 ied 
[ crane - v- [8 (+) Peet io) 
(30) m\ J Jiao (8 + tH) 
AS (-y ols + tn) _ 
[Eco (QT tes) 
if R(s) = 0. 


Proor. Denote by Go(x), Gi(x), --- , Gi(x) the distribution functions of the 
distances between the successive transitions E_; ~ Ey) , Fy — EF, , 2, > Ey, ---, 
FE, — Exsi, respectively. It is easy to see that the distribution functions G,(x) 


(r = 0, 1, +--+, &) are just Palm’s distribution functions defined by (6). Now 
clearly 
(31) Ri(x) = Goe Gy --- # G(x), 


and thus, 


(32) I 6 ARE (x) = yols)yi(s) «++ yale), 


where y,(s) (r = 0, 1, 2, ---) is defined by (6). This proves (29). 
On the other hand, if 


(33) v;(s) - | é* dR, (zx), 
0 
then we have 
tT otyd yo(s)yi(s) «++ ve(s) 
(34) | e” dm(t) = — aa ; 





COINCIDENCES IN TELEPHONE TRAFFIC 


lor, as is well known in renewal theory, we have 
my(t) = Re(t) + Re * Ri(t) + RE * Re R(t) + - 


Taking into consideration (6) and (24), we can determine ¥,(s) from (34), and 
thus we get (30). 
TuHeorem 5. We have 


Qa 


(35) a = ae ea —_ 


za (x) C, 


: k+1 > o a - « , 
t= 2am ("* "os - 8[1-2 5° ¥ w(t, 
j=l j a =k 


r 


-2y (-y"* (i) o> 2m. | 


=i e(iu) [1 —¢(iu)) 


and 
(36) 


where 


oot o(tp) 
"jet (tp) 


, 


PROOF. Since 


k+1 k — J 
CF) Sea + BCG ons a0 


j=l 


and 


2 nal g(s+ iy) _ rt 
£0 (ete, - LE coe 


2 ao 


+e Ew e+ Lv" ek zene tO 


i=1 (iu) [1 —e(tu)) 
as s — 0, we obtain easily that 


V.(s) = L— met tes os, 


as s — 0, where p; and o; are defined by (35) and (36) respectively. This proves 
the theorem. 


REFERENCES 
[1] W. Fever, ‘Fluctuation theory of recurrent events,’’ Trans. Amer. Math. Soc., Vol. 
67 (1949), pp. 98-119. 


[2] R. Forret, ‘‘Random distributions with an application to telephone engineering,” 


Proceedings of the Third Berkeley Symposium on Mathematical Statistics and 
Probability, Vol. II, University of California Press, 1956, pp. 81-88. 





142 L. TAKACS 


[3] C. Pau, ‘‘Intensitaétsschwankungen im Fernsprechverkehr,’’ Ericsson Technics No. 
44 (1943), pp. 1-189. 

[4] W. L. Smrru, ‘Regenerative stochastic processes,” Proc. Roy. Soc. Edinburgh. Sec. 
A., 232 (1955), pp. 6-31. 

[5] L. TaxAcs, “On a probability problem arising in the theory of counters,’ 
bridge Philos. Soc., Vol. 52 (1956), pp. 488-498. 

(6] L. TaxAcs, “On secondary stochastic processes generated by recurrent processes,” 
Acta Mathematica Academiae Scientiarum Hungaricae, Vol. 7 (1956), pp. 17-29. 

[7] L. Taxes, ‘‘On a coincidence problem concerning telephone-traffic,’”’ Acta Mathematica 
Academiae Scientiarum Hungaricae, Vol. 9 (1958), pp. 45-81. 


Proc. Cam 





ON THE INTEGRODIFFERENTIAL EQUATION OF TAKACS. II' 


By Epcar Reicu 
University of Minnesota 


1. Introduction. This paper continues (Cf [1]) the study of the properties of 
the function, F(t, 0),0 s F(t, 0) = F(t) Ss 1, where F(t, z) = Pr {n(t) S x}, 
t = 0,2 = 0, satisfies” 

OF (t,x) _ dF (t, x) 


(1.1) ee A(t) F(t, x) + ACE) Ke H(x — y) d,F(t, y) . 


The functions ®(s) = foe” dF(O, x), H(x) = Jo h(é) dé, 
(h(E) = 0, fo h(E) dE = 1), 


and A(t) = A’(t) 2 O are given. It is assumed that there exists a c > 0 such that 
e * h(x) ¢ L’(0, ©). The moment f¢z‘h(zx) dz, if it exists, is denoted by us. We 
put ¥(s) = foh(x)e * dx. Furthermore, we suppose that f¢ [A(t)]’ dé exists as 
a possibly improper Riemann integral for all T > 0. 

The stochastic process n(t) represents the waiting time of a customer arriving 
at time ¢ in a queue with Poisson arrivals of variable density A(t), with H(z) the 
distribution of service times. F(t) is the probability that the counter is unoccu- 
pied at time ¢. Our present purpose is to study the behavior of F(t) for large ¢, 
especially under conditions that turn out to guarantee that F(t) does not ap- 
proach zero. Previous knowledge ((2], [3], [4]) in this direction appears to be 
restricted essentially to the case \(t) = const. 

The following was proved in [1], although it does not appear as an explicit 
statement there. 

THEOREM 1. There is only one distribution-solution, F(t, x), of (1.1). Moreover, 
F(t) is the unique continuous solution of the Volterra equation of the first kind 


t 
(1.2) | G(t, u)F(u) du = g(t), (almost allt = 0), 


where 
f 


1 eins ds 
G(t, u) = PV. / elt seth) —A(u)} 1-9 (0)} as 


a ’ 
2r1 z—io 8 


1 — ts—A(t) [1—¥(s)] ds 
i g(t) = — | @(s)e" —. 
2n1 J z-iw ° 


8 


Received May 20, 1958. 

‘This work was done with support under Contract Nonr-710(16). 

2 As the referee points out, the derivation of (1.1) in [2] implicitly assumes that F(t, x) is 
differentiable in z, z > 0. However, it seems possible to prove the differentiability by an 
argument based on that of [2], p. 108. 


143 





144 EDGAR REICH 


Our approach to the asymptotic behavior of F(t) will be through (1.2). This 
will, of course, not be as simple as in the case, A(t) = const, when (1.2) can be 
solved by Laplace transforms. The main results shall be based on the restrictive 
assumption that ¥(s) is regular at infinity, and in a neighborhood of the imaginary 
axis. The class of such ¥(s) is, however, still large enough to include as a proper 
subclass the important class of rational (s). 


s plane 


~etrAi 
~wo+ Ai 
~$+$i M451 
ble tete™ 
~o- Ai 
&e = @ 





INTEGRODIFFERENTIAL EQUATION 145 


2. Some Abelian lemmas. 
LemMA 2.1. Suppose ¥(s) is regular at infinity and in a neighborhood of the 


imaginary axis, and Byu,/a S p < 1. Then there exist functions k(p) > 0,7 = 1, 2, 
such that 


Pv. st fo cronoen M8 | shea 20,82 
2Qni z—iew 

Proor. If A > 0 and 6 > 0 are chosen so that y is regular for | 9s | = A and 
Gs = —4, we can write the quantity inside the absolute value signs as H (a, 8) = 
(1/2nt) J. , where C = C(A, 4, x) is the contour shown in the figure. 

Since ¥(«) = 0 we may choose A sufficiently large so that | ¥(z + iA) | < 1. 
The function ¥(s) is of the form ¥(s) = 1 —ms + 42/28’ + O(s*), wu; > 0. Re- 
calling that uw, > 0, let us choose 6 sufficiently small so that 


(2.1) [t= A msg 


and < 0, on the segments [—5 + &, &], and [—5 — 6, —4i], respectively. 
Furthermore, we choose 6 = 4(p) sufficiently small so that, also, 
'y—1+ ms | 1 —1/2 
MAK: | Seenctee | test — 1). 
v2” 


sis 25 mis 2 
Finally, we select the number x = x(p),0 < x < 6, so that 
RY(— x + ty) <1 when 6 S |y| SA. 


(This is possible since RY(iy) = Jt cos (yE)h(E) dé < 1 when|y| > 0.) In the 
proof of the lemma we may assume that a 2 1, because in the opposite case 
both a and 8 are bounded from above, and the truth of the lemma follows from 
the fact that, because of the continuity of H(a, 8), at most an adjustment of 
k(p) is necessary. 

On the segment [—5 — 61, —4 + i], 


Rias — ali _ ¥(s)1) ™ RY le — Bus (: “ > mt) |. 


¥—1+ms8| 


M8 


S (a — Bu)(—8) + Bu |s| S (a — Bu) (—8) 


+ Burr/26 y, (po? — 1) S —ad(1 —p"”). 


646: 9 aed 
/ a ° este” = Ox(p) exp [— C2(p)a] , C2(p) > 0. 


On the segments [—65 + 61, — x-+ 61] we find, by (2.1), that 


Klas — Bll — ¥(e)]} = (a — Bus) + Bun Go (fit) 


ui8 


= 54 (t 1+ me) | < a ?. Bur E i (t— itn) | Rs. 
m8 a M8 








146 EDGAR REICH 


Since Gs < 0, the above 





]} a0 5 a(t — ot + 6° - 11) as 


—a(l — p'”)x. 
Hence 


—b—si —x+bi sa 
ol + cd < 2(6 : x) gxinea Cy(p)e-24* Calo) > 0. 


In view of the choice of x, 


[+ [| s he = tne, ex) > 0. 


—x—At —xtbi E 











Finally, 


f+ fo | 3% [mae = = Cre °**, @s(p) > 0,if a = 1. 


Putting all the above inequalities together, we obtain the lemma. 
Coro.uary. Suppose ¥(s) is regular at infinity and in a neighborhood of the 


imaginary axis, and 
lim sup [AO =A) | <1. 


t—u--e t =~ @ 


Then 
(2.2) [ G(t,u)F(u) du = [ F(u) du + O(1), ast— ~, 


Proor. There exists a JT > O such that 


y, 40 — A(u) 


— < p < 1 whenever ¢ — u 2 T. 


If | G(t, u) | S M for|t — u| Ss T then, fort > T, 


IIA 


| [ G(t,u )F(u) du — [ F(u) du | S | c Gt, u) — 1)F(u) du| + (M+ UT 








t—T 
< | ki(p)e du ++(M+1Ts kn(0) = O(1). 
0 


ko(p) 

LemMaA 2.2. Suppose y(s) = 1 — wis + 0(s), as s 0, uniformly with respect to 

arg s,| arg s| S 2/2, and wA(t)/t S p < 1. Then g(t) = t —mA(t) + off), as 
t— o, 

Proor. Let C, denote a contour along the imaginary axis, with a semicircular 

detour of radius r, center 0, to the right. Since | ®(s) | < | for Rs = 0, we have, 





INTEGRODIFFERENTIAL EQUATION 
according to a known Abelian theorem ([5], pp. 494-5), 
mee i / [emacs GS 
g(t) = amet ®(s)e gut m A(t) + o(#), 
It therefore suffices to show that 


Dit) = g(t) _ gi(t) = x [ (se fetes ee ug = o(t), ant— a. 


Choose « > 0. Then, if | s| < 5(e),| ¥ —1+ms| S €| 8], and therefore 
|e t4 11s Aels| et”. 


Consider points +iA,0 <r< A S &(e) on C,. Evidently, 


| < et rAlry hed r | ds | | = ents + 2 log 4) ’ 
-ia | — | ” ’ 
Also 


—iA ie | * d d 4 
s—(1— — s y - 
|s2f ts—(1—p) A (t—p; A) s4f 9-5. 
i le — 6¢ l= s sy A 


Hence, 
4 (t—py A)r+Aea A 


Putting r = f°, A = €'t"’, where t => T(€) = [c8(6)]’, we find that 
D(t) S Kte log (1/¢) fort = T(e). 


We shall state the following without proof. 
Lemma 2.3. If ¥(s) and (s) are regular in a neighborhood of s = 0, and if 
wm A(t)/t S p < 1 then g(t) = t—m A(t) + O(1), ast— ~. 


3. Asymptotic average of F(t). In the special case A(t) = A = const it is 
known [4] that, for the distribution solution F(t, x) of (1.1), wA < 1 implies 
lim;../'(t,0) = 1—A. In view of our above results we are able to make the 
following statements in the general case. 

TuHEorEM 2. Suppose ¥(s) is regular at infinity and in a neighborhood of the 
imaginary axis, and 

lim sup pa £@ — a <1. 
t—u+we t = @ 
Then, if F(t, x) is the distribution solution of (1.1), 


A(T) 
T 


Tueorem 3. If we also assume that ®(s) is regular in a neighborhood of s = 0 
then “‘o(1)” in the conclusion of Theorem 2 can be replaced by “O(T™).” 


1 T 
7], F&O dt=1- + o(l),asT—-+ a. 





148 EDGAR REICH 


REFERENCES 


{1] E. Reicu, “On the Integrodifferential Equation of Takacs I’’, Ann. Math. Stat., Vol. 
29 (1958), pp. 563-570. 

[2] L. Taxes, “Investigation of waiting time problems by reduction to Markov processes’’, 
Acta Math. Hung., Vol. 6 (1955), pp. 101-129. 

[3] D. G. Kenpatu, ‘‘Some problems in the theory of queues,” J. Roy. Stat. Soc. (B), Vol. 
13 (1951), pp. 151-173. 

[4] V. E. Beng, ‘‘On queues with Poisson arrivals,” Ann. Math. Stat., Vol. 28 (1957), pp. 
670-677. 

[5] G. Dornrscn, Handbuch der Laplace-Transformation I, Birkhauser Verlag, 1950. 





SYMMETRIZABLE MARKOV MATRICES 
By H. P. Kramer 


Bell Telephone Laboratories, Inc., Murray Hill, New Jersey 


Introduction. Suppose that the evolution of the state probabilities p,(t) of a 
Markov process is governed by the system of differential equations 


Dp So... 
(1) Th = Cops, 


where Q;; represents the transition rate from state S; to state S; ({1], p. 235). In 
many applications one is interested primarily in knowing the equilibrium state 
probabilities ; defined by ; = lim:.. p;(t), which, if they exist, can be obtained 
by solving the system of homogeneous linear equations 


N 
) > Qin; = 0, = 1,2,---,N. 
j=l 
While (2) can be solved in principle (and in practice if N is not too large), the 
solution in general does not fulfill the ultimate desideratum of being susceptible 
to representation as a simple function of the transition rates Q;;. If the states 
are simply ordered and a transition from a given state S; can occur only to a 
neighboring state S;, or S;4, then the equilibrium probability +; satisfies the 
following simple formula 
A 2° ee Xr; 


<r ~ Tis 


(3) r, = a 
Moos °° * Bi+t 


where \, is the transition rate from state S, to state S,4, and wu, , the transition 
rate from state S;,.,; to state S, , and 2; is chosen so that 


‘ 
(4) X r= 1, 
Processes of this sort, with simply ordered sets of states, are called birth and 
death processes. The discussion of Section 1 below deals with a class of Markov 
matrices Q which includes the set of birth and death matrices and allows repre- 
sentations analogous to (3) for the equilibrium probabilities. In Section 2 it is 
shown that all the matrices in this class have non-positive characteristic values 
and in consequence of this fact, the difference between the state probability 
p(t) and the corresponding equilibrium probability 2; is majorized by a function 
of ¢ and the set of initial state probabilities p,(0). In Section 3 the foregoing 
theory is illustrated by an anisotropic random walk. 

The following notions and notations are used. Let Q be an N X N matrix. 
The graph G(Q) associated with Q consists of vertices V;, V2, --- , Vw and of 


Received April 10, 1958. 





150 H. P. KRAMER 


directed line segments joining certain of the vertices. The directed line segment 
V;V; connecting V; to V; belongs to G(Q) if, and only if, Q;; + 0. If k,, ke, ---, 
k, is a sequence of indices, the product of matrix elements Q;,x, Qi.x, *** Qe... 
shall be denoted by the symbol [k,k, --- k,]. V; shall be said to be connected to 
V ; if there exists a sequence of indices k; , k,, --- ,k, such that [jk kz --- k,t] #0 
If » is a sequence of N positive numbers, /,(u) shall refer to an inner product 
space of sequences of N complex numbers with the inner product given by 


(2, y) = Dardis - 


Every N X N matrix represents an operator on /,(u). A matrix A is called 1,(u)- 
symmetric if (Az, y) = (2, Ay) for every x ¢ (un) and every y ¢ L(y). 


1. The equilibrium probabilities. Let 1 be a sequence of N positive numbers. 


It is clear that a matrix A is /,(u)-symmetric if, and only if, for every pair of 
indices 7 and /, 


(5) Aigui = Ajj. 


What conditions must a matrix A satisfy in order to be /(u)-symmetric for 
some sequence »? The answer is that the matrix is characterized by the “loop 
condition’’, i.e., the product of matrix elements around a closed loop is inde- 
pendent of the direction in which the loop is traversed. This statement is made 
precise in 

THEOREM |. A matrix A with non-negative off-diagonal elements A;; such that 
Ais ¥ 0 implies Aj; ~ 0 is l(u)-symmetric for some sequence wu of N positive 
numbers if, and only tf, for every index i and every sequence of indices k, , kz, --- 
k, 


(6) [tkyke on ® k,t| = [ikke ‘oe kyt). 


Proor. Suppose A is /,(u)-symmetric. A straight-forward induction based on 
(5) shows that (6) holds. 

Now suppose that (6) holds for every sequence of indices (k, , kz, --- k,) and 
every index 7. The proof shall be based on the construction of a u sequence of N 
positive numbers such that A is /,(u)-symmetric. 

By hypothesis, if V;V; is in G(A), then so is V;V; . Thus G(A) is the union of 
graphs G(A) = G, + G. + --- + G,,, which are pairwise disconnected, each 
one separately being, however, a connected graph.’ Let G; be a particular one 


of these subgraphs and V;, one of its vertices. Let u; > 0 and for every V; ¢ G,; 
define 


? 


lik ke «+ - ed 
( = ys, 
4) “1 Geka - bel” 
where the chain of vertices in G; with indices k, , k, --- , k, connects 7 to 7. It is 


clear that (6) guarantees the uniqueness of y»; once wu; has been chosen. For, sup- 


1 A graph G is called connected if a) in case the line V,,V, ¢ G then also V,V,, ¢ G and b) 
every pair of vertices is connected by at least one chain of lines. 





SYMMETRIZABLE MARKOV MATRICES 


pose ki ; ke yttty k, were another chain connecting 7 to j. Let 
1 [ikiky «++ keg 


ay ee 


Liktkia--- ky) 
(6) requires that 


[ikke --+ Kejlljkekea --+ kit] = [aki --- kejlljke «~~ kat] 


and thus, uj = yj. The procedure is repeated for all G;. For the resulting se- 
quence y it is apparent that all u; > 0 and that (5) is satisfied. 

Coro.iary 1. If A has non-negative off-diagonal elements and the graph G(A) 
consists of only one connected set and (6) is satisfied, then there exists a unique se- 
quence y of positive numbers such that A is |.(u)-symmetric and 


(8) moto tee tow = 1. 


In addition to possessing non-negative off-diagonal elements, a Markov matrix 
Q has the property that 


(9) Qi = 5 ie Qi: 


and thus that the sum of all of the elements in a column vanishes. A consequence 
of this fact is 

THEOREM 2. If a Markov matrix Q is l:(u)-symmetric for some sequence yu satis- 
fying (8) and its graph is connected, then its equilibrium probabilities x; are given by 
T= Mi. 

Proor. By hypothesis (5) holds: Q;; uj’ = Qj ui’. Summation of both mem- 
bers with respect to 7 and use of (9) complete the proof. The preceding results 
are summarized in 

THeEoreM 3. [f the evolution of state probabilities of a Markov process is governed 
by (1) and transition is possible between any pair of states in one or more jumps 
and for every i and every set (ka) of indices, |i kik, --- k, i] = [i kek +--+ ky a], 
and Q;; ~ 0 implies Q;; + 0, then the equilibrium probability xz is given by 


(10) wy = [fss —1--- lt}w,/[il2 --- sf], 


where the initial state has been chosen arbitrarily and the set of states S,;, S2,--+ , 
S, has been picked so that the product in the denominator does not vanish. Finally, 


of course, (10) together with >t. xx = 1 prescribes a unique set of equilibrium 
probabilities. 


2. The approach to equilibrium. If a transition rate matrix Q is ,(u)-symmetric 
then it represents a symmetric operator on /2(u). But every characteristic value 
of Q is of the form A(x) = (Qz, x)/(x, x) for some sequence zx ¢ lo(u). Therefore 
(x) S O implies that all characteristic values of Q are non-positive. We have 


N N 
(11) Qe, y) = 1/2 2 Gime = wm) Quay — Qxd, 


tol jue 





152 H. P. KRAMER 


but Qi ju; = Qiu; . Therefore, 


N 


(Qz, y) = —1/20 


os Qii/ mi (Yi ws - Yj My) (a; pi = 5 ms), 
and thus A(z) s 0. The result is stated in 

Tueorem 4. If a Markov matrix is |.(u)-symmetric then all of its characteristic 
values are non-positive. Zero is the largest characteristic value. 

The importance of the characteristic values of Q lies in the fact that they con- 
trol the approach to equilibrium. This fact is elucidated in 

Turorem 5. Let Q be an lo(u)-symmetric Markov matrix with connected graph. 
Let = be the sequence of equilibrium probabilities. Let p(t) be the solution of (1) with 


initial value p(O). If —r\4 and —,, are the largest and smallest negative character- 
istic values of Q then 


exp(—Awnt) || p(0)—-z || S || p)—* || S exp(—Axt) || p(0)—* || 


Proor. Letting r(t) = p(t) — x, (1) implies that 


© ri =2 (- a) = 2(r, Qr) 
ae 


Since p(t) = am + gq, where (g, 7) = 0, we have 
a = (p(t), 7) = Dotr pelt) meme’ = Dota p(t) = 1, 


and thus r(¢) is orthogonal to x. Hence, on identifying characteristic values by 
their variational properties (see, for example, [2], p. 230), 


2 
—Xm = 2 min (q, Qq)/!|q |? s ana / \| x ||? < 2 min (q, Qg)/Ilq\|* = —drw 
qa? / @aF 


Integration and identification of r(t) and the constant of integration complete 
the proof. 


3. A homogeneous anisotropic random walk on a finite lattice. Suppose a 
particle moves about among the points of a finite 3-dimensional lattice and that 
the conditional probability that at time ¢ + At it is at point P’ = (21, x2, 23) 
having been at P = (2, , 22 , 23) at time ¢ is given by: 


, , ‘ 
Ms for 1 =2%+1 Lo = 2 X3 = 23 
, , , 
Me “ u=au-—!1 Ze = 2s Ze = 2 
My “ m= 7= 1 : = 
40 N m1 = Ze = X2 + t3 = 2X3 
/ / , 
Ms “§“ awa te = %-—1 Xp = 2s 
M “ , ae / as , os 
iy y= Ze = Le 3 => 23+ 1 
M ‘“ ae Pa es 
D qi = Z; Ze = Ze Z3 = 73 — 1 


0 otherwise. 





SYMMETRIZABLE MARKOV MATRICES 153 


Transition rates to points other than those of the lattice vanish. The resulting 
process is ,(4)-symmetric and thus if +, denotes the equilibrium probability of 
the particle’s being at P = (a; , x2, 23) and m that of being at the origin, then 


+= (ie) ie) Gay 
? My Ms My a 


It is a pleasure to acknowledge the help provided through stimulating con- 
versations with my colleagues J. Tukey and V. E. BeneS. 

(Added in proofs.) There is considerable overlap between the work reported 
here and the results (particularly Theorem VIII) of the Research Announcement, 
“Integral Representations for Markov Transition Probabilities,” by D. G. Ken- 
dall, Bulletin A.M.S. 64 (1958), 358-362. The notion of symmetrizability was sug- 
gested to the author, as it seems to have been to D. G. Kendall (who calls it 
reversability), by the spectral decomposition of Birth and Death Markov matrices 
effected by Kac [(1)], Ledermann and Renter [7], and Karlin and McGregor [2]. 
(References are to the bibliography of the above cited announcement.) 


REFERENCES 


{1} J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1952. 


{2} F. Riesz et B. Sz. Nacy Lecons d’Analyse Fonctionelle, Académie des Sciences de 
Hongrie, Budapest, 1952. 





ON SOME STATISTICAL TESTS FOR MTH ORDER MARKOV CHAINS! 
By Leo A. GoopMAN 
University of Chicago 


1. Introduction and summary. Certain y” statistics were defined by Good in 
[8] and he stated that there is a “strong analogy” between certain functions of 
these statistics and some given likelihood ratio (LR) statistics appropriate for 
testing hypotheses concerning the order of a Markov chain. He also indicated 
that the analogy held when the hypothesis of “‘perfect randomness’”’ is true. 
The present author has indicated in [12] that, for the y’ statistics in [8], this 
analogy (i.e., the asymptotic equivalence of the corresponding statistics) does 
not.hold under some more general conditions when the “perfect randomness” 
hypothesis is not true. It will be seen herein that certain functions of a modified 
form of the y’ statistics are asymptotically equivalent to certain LR statistics 
in the more general case when the hypothesis H(P,,) that the positively regular 
Markov chain (see [2]) is governed by a completely specified system P,, of mth 
order transition probabilities is true. Also, certain functions of a different modi- 
fied form of the y’ statistics will be seen to be asymptotically equivalent to 
certain LR statistics in the case when the hypothesis H,, that the positively 
regular Markov chain is of order m is true. These results are helpful in determin- 
ing the asymptotic distributions of various statistics and the null hypotheses 
that can be tested with a given statistic. For example, if a given statistic G is 
asymptotically equivalent, under H(P,), to the LR statistic L for testing the 
null hypothesis H(P,) within the alternate hypothesis H, , then the asymptotic 
distribution, under H(P;), of G will be x’ with a known number of degrees of 
freedom (i.e., with a known expectation); G can be used directly to test H(P1) 
within H, (if G is sensitive to these hypotheses), although the asymptotic dis- 
tribution, under H; , of G may differ, in a certain sense, from that of L (see Sec- 
tion 6 in [1]). However, if a given statistic AG is asymptotically equivalent, under 
H(P;), to the LR statistic AL for testing the null hypothesis H; within the alter- 
nate hypothesis H, (i.e., if there is an “ostensible analogy” between AG and 
AL), but this asymptotic equivalence does not hold under some more general 
conditions (e.g., under H,), then AG can not be used to test H, within H, ; the 
asymptotic distribution, under H(P;), of AG will be x’, but the asymptotic dis- 
bution, under the null hypothesis H, , will not be x’, and furthermore the ex- 
pectation, under H, , of G can approach infinity (see [15}). 

The present author has indicated in [15] that certain functions of a modified 


Received June 3, 1958; revised September 24, 1958. 

1 Research carried out at the Statistical Research Center, University of Chicago, under 
the sponsorship of the Statistics Branch, Office of Naval Research. Reproduction in whole 
or in part is permitted for any purpose of the United States Government. 

I am indebted to I. J. Good for bringing [18] to my attention, and to 8. Ghurye for help- 
ing with the Russian translation. 


154 





MTH ORDER MARKOV CHAINS 155 


form of the y’ statistics, which were investigated by Stepanow in [18] and which 
are computed for a specified P; , are asymptotically equivalent, under H(P;), to 
certain LR statistics, but that they will not be equivalent under H, . Although 
it is stated in [18] that the results presented there can be applied to the solution 
of the problem of testing the null hypothesis H, , it is shown in [15] that none 
of the statistics in [18] can be used directly to test this composite hypothesis. 
In the present paper, it will be seen that a statistic based on a different modified 
form of ¥’, as well as certain other statistics described herein, will be asymp- 
totically equivalent, under H;, to a certain LR statistic and can be used to test 
the null hypothesis H; . 

The y’ statistics defined by Good in [10] are more general than those given in 
[3], [8], [18]. Besides studying the relation between these statistics and the LR 
statistics, we shall also discuss certain conjectures proposed in [10] concerning 
the asymptotic distributions of these statistics, which were investigated by 
Billingsley [4] for the cases Hy and H, (the author mentions that a more general 
result for H,, (m = 0) can be obtained using similar methods) and independently, 
using different methods, by the present author [14] for the case H,, (m = 0) when 
the transition probabilities are all positive (this author also mentions that a more 
general result can be obtained by similar methods). The ¥’ and LR statistics de- 
fined in [10] and [8], as well as some related statistics developed in the present 
paper, will be generalized further herein, and the asymptotic distributions of 
these generalized statistics will be investigated. This investigation leads to 
generalizations of the asymptotic distributions obtained by Good [8], Billingsley 
[3] [4], and the present author [13] [14], and it helps to clarify the relation between 
the various statistics. 

The different asymptotically equivalent forms of various statistics presented 
here make it possible for the statistician to choose whichever form he finds pref- 
erable both from the computational point of view and also from some other 
viewpoints (see [1], [5], [11]). 


2. The first order chain. Let {X,, X:,---, X,} be an observed sequence 
from a stochastic process. It will be convenient to deal herein with a circularized 
sequence of observations obtained by regarding the first observation X, as im- 
mediately following the nth observations X, (see [8], [12]). In this case, the 
frequency f(u,) of the s consecutive observations (i.e., the s-tuple) u. = (um, 
Uz, +++ , Us) in the circularized sequence will besuchthat >.., f(s) = Dow, f(ws) = 
f(wes), where w, = (w; , We, +++ , We), (Wi, We, +> , Wer) = (Ue, Us, °** Us) = 
wW.-1, and f(w,_1) is the frequency of the (s — 1)-tuple w,.. in the circularized 
sequence. A method of modifying results obtained for circularized sequences 
so that they can be applied to noncircularized sequences has been given in [12]; 
results for circularized sequences can not in general be applied directly to non- 
circularized sequences (see [12] and Corrigenda to [8}). 

The following result has been presented in [18]: Consider an observed sequence 
{X,,X2,--- , X,} from a positively regular Markov chain with constant transi- 
tion probability matrix P; = (p;;), where the possible states are 1, 2,--- , a. 








156 LEO A. GOODMAN 


Let p; denote the stationary probabilities, and let k, be the number of s-tuples 
that are possible given P; ; e.g., if all pi; > 0, then k, = a’. Let in = Dov, [f(ue) — 
fi(ue)/fi(u.), where fi(u.) = npu,]Tixi Pusu; 4. is the expected value (asymptot- 
ically) of f(y.) in the new sequence of length n given H(P), and where the sum- 
mation is taken over the k, values of yu, where fi(u.) > 0. (The fi(u.) given above 
is not the exact expected value, but is an asymptotic approximation; similar 
asymptotic approximations for expected values will be used throughout). Then 
(a) the statistics Ayj,, = vin — Vier (8 2 2) are asymptotically distributed 
(n— @) as x’ with Ak, = k, — k,_, degrees of freedom (d.f.), and (b) the A’yj,, = 
Vie — Win» + Vi»-2 for s = 3 are asymptotically independent and distributed 
as x’ with A’k, = k, — 2kva + kyo df. 

We shall now introduce ¢’ statistics, which are related to, but different from, 
the y’ statistics. Let ¢i.. = 2 >-y,f(us) log [f(us)/fi(u.)]. Then, for s = 2, 

2 


Adi a bi, mic $i.2-1 
2D flue) log (f(us)/fi(us)] 


= Lis, 


where U1 = (U1, U2,°**, Uses), and fils) = f(Us-1)Pu,_,u, 18 the expected 
value (asymptotically) of f(u.) in a new sequence of length n given f(u,.) and 
H(P;). For s 2 3, 


A’dis = Adis — Adiea = Ali, 
= 2D flu.) log [f(u.)/F.-2(u.)] 
Vs 
_ M,-2,s 
where w.-2 = (ti, We, -°*:* , We-2), P,-2 is the maximum likelihood estimate of 


the (s — 2)th order transition probability matrix when H,_. is true (see [14]), 
and f,o(us) = f(ue+)f(wss)/f(w.-s) is the expected value (asymptotically) of 
f(us) in a new sequence of length n given f(u,.) and H(P,_2). Let 


K, 2 > f(us) log f(u,), 


Ky. 


ll 


2 > flue) log plus), 


and 
Ki, = 2 flu) log np’(w), 


where p(uz) = Pu,u, and p’(u) = Pu, - Then ¢i, = K, — (s — 1)Kiz — Kia, 
Adi, = AK, — Ki2, and A’¢;,, = A’K, . It can be seen that, given H(P;), the 
statistics ¢{,, are asymptotically equivalent to the vi. , and that (a) the Agi. = 
AK, — Kis = lh. (8 2 2) are asymptotically x’ with Ak, d.f., and (b) the 
A*gi., = Al), = A’K, = M,_2,, are asymptotically independent and distributed 
as x’ with A’k, df. (e.g., see methods in [1], [4], [14]). 





MTH ORDER MARKOV CHAINS 


G,, = a (flu) — flu) P/filw), 


Fy_2,2 = x [f(u.) _ F.—2(us)I*/Fo—2(ue). 


Then it can be seen (with methods in [1], [4]) that, given H(P:), the G,,, are 
asymptotically equivalent to L;,, = Adj,, and thus to Ayj,, . Hence, the AG;,, = 
Gi. — Gi, are asymptotically equivalent, given H(P;), to A’yi,,. Also, the 
F rte are asymptotically equivalent, given H(P;), to M.2, = A’¢},, and thus 
to A Vis . 

The statistics L,,, are the LR statistics for testing the null hypothesis H(P) 
within the alternate hypothesis H,_, (see [2]). Although G,,, and Ayj,, are asymp- 
totically equivalent, given H(P;), to Ih, , it can be seen that, in the case where 
H(P,) is not true, G;,, and Ayj,, are asymptotically equivalent to L,,, only in the 
special sense that the usual x’ goodness of fit statistic for the standard test of 
a simple null hypothesis concerning a multinomial distribution is considered to 
be asymptotically equivalent (even when this null hypothesis is not true) to the 
usual LR statistic for this hypothesis (see Section 6 of [1]); i.e., if the null hy- 
pothesis H(P;) is not true, these statistics will be asymptotically equivalent only 
if the true hypothesis approaches, in a certain sense, H(P;) at a sufficiently fast 
rate as n — ©. The relative advantages and disadvantages of L;,,, Avi, , and 
G,,, as tests of H(P,) will not be discussed here, since such discussions for some- 
what related problems appear in [1], [5], and [11]. 

The statistics M,_2,, are the LR statistics for testing the null hypothesis H,_, 
within H,_, , and their asymptotic distribution, under H,_2, is x’ with A’k, df. 
(see [8], [16]). Although A’yj,, and AG,,, are asymptotically equivalent, given 
H(P,), to M,_2,. , it can be seen that the former statistics are not asymptotically 
equivalent, given H,_2 , to M,_2,, , and their asymptotic distribution will depend 
on P, (which is used in the computation of A*y;,, and AG,,,) and on tue particular 
system P‘_» of transition probabilities that is true when H,_2 is true (P{_. can 
be viewed as a particular a” < a” matrix of transition probabilities describing 
a given Markov chain of order s — 2); e.g., for some P, that differ from Pe, 
the statistics A’yj,, will converge in probability, given H(P;-2), to infinity even 
though the null hypothesis H,_, is true (see [15]). Thus, the asymptotic distribu- 
tion, given H,_2, of the statistics A’y;,, and AG;,, will depend on unknown values 
of the parameters; these statistics can not be used to test the null hypothesis 
H,_2 in the same simple manner as when the LR statistic M,_2,, is used. How- 
ever, it can be seen that F,_2,, is equivalent, given H,_, , to M,_2,, ; and thus can 
be used to test H,_2 within H,_, (see [12], [13]). 


3. The general case. Stepanow [18] mentionsthat, given H(P,), theasymptotic 
independence of the A’*yi., statistics leads to the fact that the statistics Ayi,, — 





158 LEO A. GOODMAN 


Avi. (s > t = 2) are asymptotically x’ with Ak, — Ak; d.f. It can also be seen, 
given H,_,, that the A’K; statistics (for 7 = s) are asymptotically independent 
(see [4], [14]); thus the AK, — AK, statistics, given H,, , are asymptotically 
x’ with Ak, — Ak, df. (see [8]). To test H,. within H,,, the AK, — AK; are 
the LR statistics (see [8]). The Ayi,. — Avi, , given H,, , are not asymptotically 
equivalent to the LR statistics, and can not serve as a test of H,_; (see [15]). 

We have that AK, — AK; = 2 D°y, f(us) log (f(us)/fra(ue)) = Mess, where 
v= (wi , We, 1 » Ws) ° (Us—t41 » Us—t42 » oy » Us), Wer _ (wi, We, os » We), 
and fis(us) = f(u—+)f(wd/f(wes) is the expected value (asymptotically) of 
f(u.) in a new sequence of length n given f(us-1) and H(P,.). Let Pia. = 
dou, (ue) — Frs(ue)]’/Frs(y.). Then, it can be seen that, given H,_1, the statis- 
tics F,.,, are asymptotically equivalent to M:1, = AK, — AK: = Adin. — 
Adis - 

Let 


Vit. = x [f(us) ae frr(u)?/fis(u.), 
where (for s > ¢ 2 1) 
Uc = (Ur, Ue, ++, Ue), Wes = (Us, Wega, -°* , Ueges), 


Ve = (uy » Mega, °°" 5 Ui+t-2), 


and 
e—t+1 


Si-s(us) = F(ue) IT flu. /fenrad 


is the expected value (asymptotically) of f(y.) in a new sequence of length n 
given f(u:+) and H(P,_,). Then, given H;_1, the statistics Vi-1,. are asymptot- 
ically equivalent to 


Gir. = 220 flu.) log (f(u.)/fia(y)] = Ke — Kia — (s — t + 1)(AK)). 


Since Adis, = AK, — AK,, the Adi-1, or the Ayi_,,., as well as the F,,,, 
can be used to test H,_, within H,. . 

Consider now the hypothesis H(P;) that the positively regular Markov chain 
is of the tth order and governed by the system of transition probabilities P, 


(see [2}). Let Pr{ tess | UW, U2, ; ee Us} we Puyug-s-utas _ P(Ur+1) denote the transi- 
tion probability that the j-th observation in the sequence will be u:4: , given that 
the (j — 1)-th, (j — 2)-th, --- , (gj — t)-th observations were u; , Ui1,Us2,---, 


u, respectively, and let p’(u,) denote the stationary (absolute) probability for 


Ur. Let di. = Ziv, f(us) loglf(u.)/fi(us)], where (for s > t = 1) fue) = np’ (ur) 
TLi=i pu,u;.1---u;,,i8 the expected value (asymptotically) of f(y.) ina newsequence 
of length n given H(P,). Then 


Adis = bie — Gea = 220 flus) log (f(us)/filus)) = Lis, 





MTH ORDER MARKOV CHAINS 159 


where fi(us) = f(ue-1)Pws— ue, 41s, 18 the expected value (asymptotically) of 
f(us) in a new sequence of length n given f(u.1) and H(P,); and 
M’gis - Adi. = Adi 1 = Aly. 


=2 x flus) log [f(us)/fe-2(us)] = Mio. = A’bi, = A’gi. 


_ (s —~ Kees - Kis, 


= 2 x S (ers) log P(ur+1) 


Bet+t 


2>° flu.) log np’(u,), 


we see that 
Adi. = AK, — Ket. 


Let iw = Divs lf(ue) — few)I/f(us), and Ges = Dev, [f(ue) — feud P/fe(w- 
Then it can be seen that, given H(P,), the Avi, , Adi, = Li, and G,,, are all 
asymptotically equivalent, and each is asymptotically distributed as x’ with 
Ak, df. 

The reader will note that k, and the statistics mentioned in the preceding 
paragraph depend on P,. In the case where the null hypothesis tested is H, , 
the particular statistics mentioned earlier herein appropriate for such a test do 
not depend on P, , but their distribution does since k, does. If the null hypothesis 
to be tested is H; (and P; is not specified), then the value of k, to be used can 
be estimated consistently from the observed number of y, where f(u.) > 0. (In 
the general case where the chain may contain some transient states, but where 
only the recurrent states are of interest and the transient states are not, the con- 
dition f(u.) > 0 should be applied to the sequence obtained by omitting all ob- 
servations before the first one that is in a recurrent state (see [4]); the recurrent 
states can also be estimated consistently from the observed sequence.) 


4. The distribution of the /* and y* statistics. Conjectures concerning the 
asymptotic distribution of ¥7,, and ¥i,. were proposed in [10] and modified forms 
of these conjectures were proved in [4] and [14]. We now present the following 
generalization of these results: 

(A) The asymptotic distribution of ¥,., given H:, is sia K Kya(x/d), where 
* denotes convolution, g(A) = A’ky41- , and K,(z) is the x’ distribution with 
g degrees of freedom. 


(B) Let vis - Edi [f(ue) ty fi (ul /fi (us), where 
Ge (ue) = Sue) [Tit Duyn gr--wi se 


is the expected value (asymptotically) of f(u,) in a new sequence of length n 





160 LEO A. GOODMAN 


given H(P,) and f(yu,). Then the asymptotic distribution (n — ©) of vii , 
given H(P,), is #21 Kgoo(x/d) * Kncolx/(s — t)], where h(t) = Aki. 

Statement (A) can be seen to follow from the fact that ¥7,, is asymptotically 
equivalent, given H, , to 


s—t—l 


(4.1) bis - K, "? K, - (s rg tAK ir _ 2d ja Kori» 

Fe 
where the statistics A’K, (for s > t + 1) are asymptotically independent (see 
[4], [14]). Statement (B) can be seen to follow from the fact that y7? is asymptot- 
ically equivalent, under H(P;,), to 


(4.2) sa 
ce K, - K, _ (s - t)Keevs = $:,2 + (s = t) Adi e+1 


where ¢;,, and Agi.:41 are asymptotically independent (see [4], [14]). We also 
note that the asymptotic distribution of ¥#? (or $73) is different from that of 
vi.. ; but the asymptotic distributions, given H(P,), of Ayi,, and Ay are identi- 
cal since Adi, = At: . 

The results presented here were for t 2 1. The case where ¢t = 0 can be treated 
in a similar fashion (see [3], [9], [10], [14]). 


5. Some generalized statistics and their distributions. From (4.1) we see 
that, for s = t + 2, ¢;, and #,, are asymptotically equivalent, given H,, to 
A’K, = M.,,,, the LR statistic for testing the null hypothesis H, within the 
alternate hypothesis H,_, (i.e., M:,. = —2 log As... , where A;,.- is the ratio of 
the maximum likelihood given H, to that given H,_,). This relationship between 
M,,, and ¥;,, (and ¢;,,) does not hold for s > t + 2. Also, fors = t + 1, ¢*; and 
yF? are asymptotically equivalent, under H(P,), to Adi, = Li. , the LR statistic 
for testing the null hypothesis H(P,) within H,_,. This relationship between 
L,.. and ¥%2 (and ¢*) does not hold for s > t + 1. We shall now present, for 
s 2 t + 2, a generalized statistic is: that will include both ¢7,, and M,,, as 
special cases, and a statistic ¥7,,.. that will be asymptotically equivalent, given 
H,, to di.s:r. Also, we shall present, for s 2 t + 1, a generalized statistic ¢f?., 
that will include both ¢7? and L,., as special cases, and a statistic y7%,, that will 
be asymptotically equivalent, given H(P,), to a> oe inally, a generalized statis- 
tic M,..,, (different from ¢/,.,-) will be presented that will include M,,, as a special 
case, and the asymptotic distribution of each of these generalized statistics will 
be investigated. 

Let 

Fev ex 2> f(us) log [f(us)/ferr(us)], 
Vs 


Fine = 7 (f(us) sn Ser(ud?/frr(us), 
2 > flu.) log [f(u.)/fF-(us)], 


+2 
Pi,s:r 





MTH ORDER MARKOV CHAINS 


i = 2X (f(us) 7 fi-(u)l/f i. (u), 


where 


Sace(us) = S (Use) Tl F (err ite) friar), 


a—t—r 


Si-r(us) me F (Were) I] Puigrtigrgie Migrge , 


Ur+r = (U1, U2, *** 5 Urge), Utsiitr = (Wise, Uigrgr, *°* » Uigrge)s 

and 
Uitr = (Uigr, Uigrgr, °°» Urge) forO Sr <8 — t. 
Then for r = 0, 
Gir = Fie, Vier = Via,  bhar = te, Wear = Wea 

forr=s—t-—1, 

iar = Mis, Via = Fea, Stair = Lin, Wear = Gra 
Let 

Meer = 2 22 flue) log (flue) /Feie(us)] 


F .:° = 7 [f (us) _ Susr(ue) P/F esr (ue), 


where 
Feie(e) = f(Uear)f(Wesrse)/f(wod, 
Wetter = (Wi, We, e+, Wester) = (User, Urtorge » *** » Ue), 
and 
Ww, = (Wi, We, -** , Ws) forO0 Srsis—t-—2. 


Then for r = 0, M;..., = M;,, and F,,,., = F,,. ;forr = s—t— 2, Meer = Mi, 
and F,.,, = F,,., where F,,, is F;,, computed for the sequence {X,, Xn-1,-°-- , 
X;,} circularized rather than for the sequence {X,, X:,--- , Xa}. The statistic 
M:...:r (or F:,s;7) can be seen to be the sum of the LR statistics (or the goodness 
of fit statistics) for testing “independence” in each of k, “contingency tables” 
(i.e., M;,.;7 is the product of the ratios of the maximum likelihood given in- 
dependence to that given “‘nonindependence” in each “contingency table”’ 
when normed In the usual way; viz., —2 times the log of this product) obtained 
by “splitting” each s-tuple y, into a (s — t — 1 — r)-tuple (uw , vw, «++ , Ue—t-1-+), 





162 LEO A. GOODMAN 


a t-tuple (t.1-+ , Uet-r41, °°* » Ue-r-1), and a (1 + r)-tuple (u,_,, Usrit, *** 
u,); for each t-tuple a “contingency table” can be formed where the “expected 
value” of the observed cell entry f(u.) is fi.-(y.) under the assumption of in- 
dependence in the table (see [13], [14]). 

We shall now present simple derivations for the asymptotic distributions of 
the generalized statistics using a similar approach to that described in Section 
4. We have that 


s—t—1 


Boe - K, os Keser = (s -t= r) AK i41 - 2 d(j)A°K cex1~5 
j= 


where 


uae frOSjas-t-—-r 
w= _1_» fors—t—rsjss-—it-1l. 
Therefore, the asymptotic distribution (n > ©) of $i, , given H;, is 


s—t—1 s—t—r—1 


* Keolz/dQ)] = * Keay(z/d) * Knern—awlz/(s — t — 1)). 
= Awl 


This will also be the asymptotic distribution of ¥,.,- , under H, , since ¥i,..- and 
$:,e:r are asymptotically equivalent under H, . 
By a similar approach, we see that 


s—t—1 


M 0:7 = K, - Roni-e “' (Kepipr a K,) _ z c(j)A°K wex1-5 ’ 


j= 


where 


j for 0 Sj Sv 

c(j) = 40 forv Sj s—t—v 
(s — t—j) fors—t—vsj 

and v = min[r + 1, s — t — r — 1). Therefore, the asymptotic distribution 

(n —— ©) of 7 (and Fi s;r), given H, ’ is 


s—t—1 


* K,oslz/eQ)] 


(see [14)). 
We also see that 


bar = K, — Kiyre — (8 —t — r)Kiiy = dius +(e-—t- PAGE 41 « 


Therefore the asymptotic distribution of ¢73., (and ¥72.,), given H(P,), is 


s—t—1 


a K,aylx/d()] * Kiwla/(s —t — 1)] 


s—t—r—1 


™ bx Kgay(x/d) ” Knarnla/(s =~ s~« r)). 








MTH ORDER MARKOV CHAINS 163 


If P, is not completely specified but is a function P,(a) of a vector parameter 
a = (a, a2, -°** , a) that ranges over an open subset of v-dimensional Euclidean 
space, and if this function satisfies certain regularity conditions (see [4]), then 
it can be seen that the statistic ¢73.-(&) (or ¥i.2;7(&)), obtained by replacing P; 
by its maximum likelihood estimate P,(&) in the computation of ¢7%., (or ¥7°.,), 
will have the same asymptotic distribution as ¢73,, except that the degrees of 
freedom h(t + r) should be replaced by A(t + r) — v. If P, is the transition 
probability matrix for a completely unspecified positively regular tth order 
Markov chain, thenv = Aki: = A(t), @raie(&) = Otgr, Weeie(@) = Vie, and 
the asymptotic distribution, under H,, of these statistics was given earlier 
herein. This result is closely related to and generalizes the asymptotic distribu- 
tions in [4] for r = 0 and s — t — 1 whent = Oorl. 

This investigation of the asymptotic distributions of various generalized 
statistics, under particular null hypotheses, indicates that each of these statistics 
is asymptotically equivalent (under a particular hypothesis) to a weighted sum 
of the LR statistics A’K,, A°K,1,--- , A’Kiye, Adt.ea: (or 4¢%.141(&), where 
$:,141(&) is defined as ¢/,14, with P, replaced by P,(&)). The particular generalized 
statistic that will be appropriate for a given problem will depend in part on the 
appropriate weighting of the LR statistics, which will in turn depend on the 
specific null and alternate hypotheses considered (see [4], [14]). 

In closing, we point out that the asymptotic mean values of ¢%,.;- and 
M..«:r under H, , and of ¢7:;, under H(P,), can be computed directly by reference 
to the decomposition of the various statistics in terms of the K’s. Wt-<n all transi- 
tion probabilities are positive, these mean values are a‘*’(a"*” — 1) — (s — 
o- r)a‘(a os 1), a‘(a oy ae 1)(a’*" = 1), and rr a 1) for teir ’ M t,9;+ 
and ¢71:r, respectively, and they can be given some interpretation in terms of 
the asymptotic mean values of certain corresponding LR statistics computed from 
a set of a‘ (and a‘*’) independent “contingency tables” (see [13], [14]). 


REFERENCES 


{1] T. W. ANDERSON AND Leo A. GoopMaN, “‘Statistical inference about Markov chains,’’ 
Ann. Math. Stat., Vol. 28 (1957), pp. 89-110. 

[2] M. 8S. Bartuert, ‘“‘The frequency goodness of fit test for probability chains,’’ Proc. 
Cambridge Philos. Soc., Vol. 47 (1951), pp. 86-95. 

[3] P. Brtuinasiey, ‘Asymptotic distributions of two goodness of fit criteria,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 1123-9. 

[4] P. Brtuines.ey, ‘“‘On testing Markov chains,’’ presented at the meetings of the Insti- 
tute of Mathematical Statistics, Sept. 10-13, 1957. 

[5] W. G. Cocuran, ‘‘The x?-test of goodness of fit,”” Ann. Math. Stat., Vol. 23 (1952), pp. 
315-45. 

{6} R. Dawson anv I. J. Goon, ‘‘Exact Markov probabilities from oriented linear graphs,”’ 
Ann. Math. Stat., Vol. 28 (1957), pp. 946-56. 

[7] C. Derman, “Some asymptotic distribution theory for Markov chains with a de- 
numerable number of states,’’ Biometrika, Vol. 43 (1956), pp. 285-94. 

[8] I. J. Goon, ‘‘The likelihood ratio test for Markov chains,’’ Biometrika, Vol. 42 (1955), 
pp. 531-3; ‘“‘Corrigenda,’’ Vol. 44 (1957), p. 301. 

{9} I. J. Goon, “On the weighted combination of significance tests,’”’ J. Roy. Stat. Soc. 
Series B (Methodological), Vol. 17 (1955), pp. 264-5. 





164 LEO A. GOODMAN 


{10} I. J. Goon, ‘‘Review of P. Billingsley’s ‘Asymptotic distributions of two goodness of fit 
criteria’,’’ Math. Reviews, Vol. 18 (1957), p. 607. 

{11] I. J. Goon, ‘‘Saddle point methods for the multinomial distribution,” Ann. Math. Stat., 
Vol. 28 (1957), pp. 861-81. 

{12] Leo A. Goopman, “Simplified runs tests and likelihood ratio tests for Markov chains,”’ 
Biometrika, Vol. 45 (1958), pp. 181-97. 

[13] Leo A. GoopMan, ‘‘Exact probabilities and asymptotic relationships for some statistics 
from mth order Markov chains,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 476-90. 

[14] Lzo A. Goopman, “‘Asymptotic distributions of ‘psi-squared’ goodness of fit criteria 
for mth order Markov chains,” Ann. Math. Stat., Vol. 29 (1958), pp. 1123-33. 

[15] Lzo A. Goopman, ‘fA note on Stepanow’s tests for Markov chains,’”’ Teoriya Veroyat- 
nostei i ee Primereniya (The Theory of Probability and Its Applications), Vol. 4 
(1959), to be published. 

[16] P. G. Howt, “‘A test for Markov chains,’’ Biometrika, Vol. 41 (1954), pp. 430-3. 

[17] G. M. Jenxrns, “Contribution to the discussion on ‘Regression analysis of binary 
sequences’ by D. R. Cox,’’ J. Roy. Stat. Soc. Series B (Methodological), Vol. 20 
(1958), pp. 238-9. 

(18) V. E. Steranow, “Some statistical tests for Markov chains,’’ (in Russian), Teoriya 
Veroyatnostei i ee Primereniya (The Theory of Probability and Its Applications) , 
Vol. 2 (1957), pp. 143-4. 

[19] P. Wuirrze, ‘Some distribution and moment formulas for Markov chains,’’ J. Roy. 
Stat. Soc. Series B (Methodological), Vol. 17 (1955), pp. 235-42. 





CONSENSUS OF SUBJECTIVE PROBABILITIES: THE 
PARI-MUTUEL METHOD 


By Epmunp EISENBERG AND Davip GALE 
The Rand Corporation and Brown University 


A certain probability space is contemplated by a group of m individuals, each 
of whom endows it with his own subjective probability distribution. Suppose, 
now, that we wish to form a distribution which represents, in some sense, a con- 
sensus of those individual distributions. Various possibilities suggest themselves: 
the average, the convolution—but wait. There actually exists a popular institu- 
tion which, theoretically at least, does perform just such an aggregation of 
personal probabilities. We refer to the pari-mutuel method of betting on horse 
races. In this system the final “‘track’s odds” on a given horse are proportional 
to the amount bet on the horse. We shall here investigate the type of consensus 
given by this mechanism, which turns out to be quite different from any of the 
obvious aggregation schemes that might occur to one. 

In formulating the pari-mutuel model we assume the m individuals involved 
are bettors, labeled B, , --- , B, , concerned with a race involving n horses, 
labeled H, , --- , H, . We assume further that each B; , after careful study of the 
form sheets, the condition of the track, and other relevant material, has arrived 
at an estimate of the relative merits of each of the H;’s which he expresses in 
quantitative terms. Specifically, we are given an m X n subjective probability 
matrix P = (p;;) where p;; is the probability, in the opinion of B; , that H; will 
win the race. 

Having determined his subjective probability distribution, B; will now bet the 
amount }; , a fixed positive number called B,’s budget, in a way which maximizes 
his subjective expectation. This means, of course, that B; will not necessarily bet 
the whole amount b; on that H; for which pj; is largest. In general, B, will ‘‘bet 
the odds,” that is, he will wait until the final track odds, or more conveniently, 
track probabilities, are announced. If these are m , --- , x, , he will examine the 
ratios p;;/7; and distribute b; among those H; for which this ratio is a maximum. 
We shall refer to this course of action as B,’s strategy. 

A technical difficulty is immediately apparent. We have already stated that 
the final track probabilities 7, , --- , +, are proportional to the amounts bet on 
H, ,--- , H,, respectively (this is true whether or not the track retains a per- 
centage). Thus, in practice, the z;’s are not known until each B; has made his 
bet. On the other hand, B; must know m , --- , x, before he can determine his 
bet. There is, therefore, a serious question as to whether there exist final track 
probabilities and individuals’ bets compatible both with the bettors’ strategies 
and the pari-mutuel principle. It is the purpose of this note to show that such 
probabilities and bets do exist and that the probabilities are in fact unique, 





Received June 16, 1958. 


165 





166 EDMUND EISENBERG AND DAVID GALE 


thus giving a well-defined notion of consensus. Of course, the “influence” of B; 
on the consensus will depend on his budget b; , the case of equal influence being, 
by definition, that of equal budgets. 

It will be convenient to choose the unit of money so that >°7.1 5; = 1. We 
shall also assume that each column of the matrix P contains at least one positive 
entry. If this were not so then, say, p;; = 0 for all 7 and none of the B,’s would 
bet on H; under any circumstances. We could then eliminate H ; from considera- 
tion entirely. 

We shall now arithmetize the conditions which must be satisfied under the 
pari-mutuel system. Let 8;; be the amount which B; bets on H; . These must 
satisfy the budget relation. 


(1) » Bi; = b:. 
j=l 

Next, the pari-mutuel condition requires that 

(2) Za Bij = Tj, 


which is simply the statement that the final track probability 7; is proportional 
to the total amount bet on H; . Equality holds here because of the normalization 
of the monetary unit. (We are using Greek letters to represent unknowns, Latin 
letters for the given constants of the problem.) 

Finally, we must express the fact that each B; is maximizing his expectation. 
The reader will easily verify that the condition is the following: 


(3) if wu; = max Pit and Bi; > 0, then yp; = ee, 

8 s j 
which states that B; bets only on those H,’s for which his expectation is a maxi- 
mum. 

Nonnegative numbers z; and §;; which satisfy (1), (2) and (3) are called 
equilibrium probabilities and bets. Their existence can be proved by means of 
fixed-point theorems. We prefer, however, to prove existence in an elementary 
manner using a variational method which seems to be of interest in itself. We 
define a function ¢ and show that the variables which maximize it correspond to 
a solution of (1), (2) and (3). 

The function ¢ has mn arguments £,; and is defined by the rule: 


o(En , oo » San) = ds log 2 Dis §i3 , 


the variables £;; being restricted to the domain D defined by: 
(4) &:; 2 0, for all 7, j, 


(5) > gs = 1, for all j. 
i=l 





SUBJECTIVE PROBABILI‘i28 167 


We shall come back and discuss the meaning of the function ¢ after we have 
shown its relation to the pari-mutuel problem. 

If we include minus infinity in the range of ¢, then ¢ is continuous on the 
compact set D, hence attains a maximum at some point (En, °++ , Ema) of D. 
At this maximum the term }-}_; p:;£:; is positive for every i (otherwise ¢ would 
be minus infinity, which is clearly not its maximum value). The partial deriva- 
tives of ¢ at the maximum are given by 

Op _ i Dis 
0; p Pis bis 
We now assert: 

Existence THEOREM. A set of equilibrium probabilities +; and bets B;; are 

given by 


d¢ bi Dis 
(6) ; = max — = max ———— 
f i Obi; ‘Dis Dia bin 
(7) Bi = Ei; Wj. 


Proor. We must show that the numbers 2; , §;; satisfy (1), (2) and (3). The 
pari-mutuel condition (2) follows at once upon summing (7) on i and using 
condition (5) on the &;;. 

The verification of (1) and (3) depends on the fact that 


(8) if Ej > 0, then x; = 2%. 

iP] 
To see this, suppose (8) is false and for some i, j we have §;; > 0 and 
x; > 06/0§;;. By definition of +; we have 7; = 06/0&; > 06/0§;; for some 
index k. Which means that by slightly decreasing £;; and increasing £,; by the 
same amount (which would not violate (4) or (5)), we could increase the value 
of ¢, which is impossible since we are already at a maximum. Thus (8) is estab- 
lished. 


We next verify condition (1). From (7) and (8) we have 


z b; Dis 
Bij = 85 85 = Fi 
Fie Dis bis 


Summing the above on j, 


Di Pii bij 
; Bij = ,ea-— - bi. 
ds a Dis Eis 


Finally, we must prove (3). Since we assumed that none of the columns of the 
matrix P is identically zero, we know that each 7; is positive. Thus from (6) we 
have 
(9) Pii < aon Pis Eis 


Tj b; 





168 EDMUND EISENBERG AND DAVID GALE 


and y;, as defined in (3), is (1/b:) }>s patie. We see, then, from (7) and (8) 
that if 8,; is positive, £;; is positive and hence uw; = pi;/4;, thus showing that 
(3) holds. (The fact that the z;’s sum to 1 is, of course, a consequence of (1) and 
(2) and the normalization of the b,’s.) Q.E.D. 

The function ¢ can be interpreted as follows. From (7) we have >. ; pijéi; = 
3 Bi;(pi;/x;), which is exactly the subjective expectation of B; when he bets 
8;; on H; with track probabilities 7, , --- , +, . Thus, at equilibrium the bettors, 
as a group, maximize a weighted sum of logarithms of subjective expectations, 
the weights being the bettors’ budgets. As noted previously, equilibrium prob- 
abilities turn out to be unique, although equilibrium bets need not be unique. 
Furthermore, not every collection of 8;;’s, obtained by having each B, act 
according to his strategy at equilibrium probabilities, need be equilibrium bets. 

As a final result we show 

UNIQUENESS THEOREM. Equilibrium probabilities are unique. 

Proor. Let m, --+ , 7, and #,,-:: , #, be equilibrium probabilities, let 8;; , 
B.; be corresponding bets, and yu; , @; as defined in (3). Then for all 7, j, k we have: 


Bijmens = Bispis S Bish; 

Bult, = Bupa S Baum 
whence, since wi, Ai, 7), ™: ate positive, B;Bu(#e/me) S Bibut;/x;. Sum- 
ming on j, k we get: bj >-s Bi(te/me) S bi >_; Bist,/4; ; dividing by b; and 
summing on i: )ox (#eie/me) S D0; #; = 1. 


Let x = t% /V a, ye = Va,- From the Cauchy-Schwartz inequality we 
’ have: 


(Can't =-(Ca-1s(CHDCY-=- (x a2#:) (On) 1 


T 





Thus the vectors (x, , --- , tn), (yi, °** , Yn) are dependent and #, = um . But 
# = > m = 1, hence » = 1 and m = #, proving uniqueness. 

The referee has suggested the following instructive example which indicates 
the somewhat “pathological” nature of the pari-mutuel consensus. In the case 
of two bettors with equal budgets if the first bettor’s subjective probability 
distribution on two horses is (}, 4), then the equilibrium probabilities will be 
(4, 4) regardless of the subjective probabilities of the second bettor, as the reader 
will easily verify. 





A NOTE ON PERFECT PROBABILITY? 
By GopinatH KALLIANPUR? 
Michigan State University 


1. Introduction. The purpose of this note is to define and characterize a class 
of perfect probability spaces which we shall call D-spaces. Gnedenko and Kol- 
mogorov seem to have been the first to introduce explicitly the notion of perfect 
measure [1], although a special case (“normal space’’) was studied by Halmos 
and von Neumann as long ago as 1942 [2]. An illuminating appendix by Doob 
in [1] (see also his remarks in the appendix to his own book [3]) further testifies 
to the fact that the notion of perfectness of a measure has been well known to 
mathematicians for quite some time. 

The triplet (Q, F, u) is said to be a perfect probability space if yz is a probability 
over the o algebra $ of subsets of © and if for every univalent, real valued, $-meas- 
urable function f the following is true: For every linear set A such that f(A) «5, 
there exists a linear Borel set B with B C A and 


wif (B)} = wif "(A)}. 


While a perfect probability space (Q, 5, ») has many desirable properties 
({1], [4]), the definition of perfectness clearly involves the measure yu in an es- 
sential manner. This raises the interesting question of defining classes of meas- 
urable spaces (2, $) with the property that for every probability yu, the space 


(Q, F, w) is perfect. The Lusin spaces introduced by Blackwell [4] as well as the 
D-spaces to be defined in the next section, possess this property. Theorem 3 
gives a necessary and sufficient criterion for a D-space. This result is similar 
to (though not identical with) an unsolved problem posed by Blackwell for 
Lusin spaces ({[4] Problem 2). 


2. D-spaces: definition and characterization. We shall say that a linear set 
A is a D-set if A is measurable with respect to F for every Lebesgue Stieltjes 
probability measure F. Borel sets and analytic sets are examples of D-sets. 

A measurable space (2, $) will be called a D-space if 

(1) ¥ is a separable o-field of subsets of 2, and 

(2) The range of every univalent, real valued, S-measurable function f is 
a D-set. 

TueroreM 1. Let (Q, F) be a D-space. Then if @ C & is any separable sub o-field 
of &-sets the probability space (Q, @, uw) is perfect for every probability wu defined 


Received September 26, 1958. 

1 This research was supported in part by the United States Air Force through the Air 
Force Office of Scientific Research of the Air Research amd Development Command, under 
Contract No. AF 18 (600) - 442, Columbia University. Reproduction in whole or in part is 
permitted for any purpose of the United States Government. 

2 On leave from the Indian Statistical Institute. 


169 





170 GOPINATH KALLIANPUR 


over Q@. In particular, (Q, 5, uw) is perfect for every probability u defined over S. 
The proof of this theorem is based on the following 

Lema. A necessary and sufficient condition for (Q, F, 4) to be perfect is that for 
every univalent, real valued, ¥-measurable function f there exists an ¥-set Q such that 


(2.1) u(Q.) = 1 and f(Q) is a Borel set. 


The above lemma is known in the literature, but we shall give a short proof 
of the sufficiency. The necessity of (2.1) is almost obvious and its proof is omitted. 
If (2.1) holds, Halmos and von Neumann have shown [2] that to every $-meas- 
urable function f and every $-set B corresponds a measurable set By contained 
in B such that u(Bo) = u(B) and f(Bo) is a Borel set. Hence, if A is any linear 
set with fA eS, there exists a measurable subset X» of fA such that (Xo) 
= u(f'A) and f(X) is a Borel set. Since f(Xo) C A we may write f~ A in the 
form fA = f*{f(Xo)} UN, where N C f(A) — Xo and u(N) = 0. Writing 
B = f(Xo) we have B a Borel set, contained in A and u(f"B) = u(f A), so 
that (Q, ¥, uw) is perfect. 

To prove Theorem 1, let uw be an arbitrary probability over $ and &, the 
o-field obtained by completing $ with respect to uw. For any $-measurable f let 
5* be the o-field of linear sets A such that fA eS, , and let uy be the probability 
over $* defined by u;(A) = u(f A). uy is then a complete probability over $*. 
Finally, if F is the Lebesgue Stieltjes measure generated by the distribution 
function of f and @,y the o-field of F-measurable sets, it is easy to see 
that @r C $* and uw, = F on G@,. Since (Q, $) is a D-space by our assumption, 
f(Q)e @r , so that there exists a Borel set B € f(Q) such that F(B) = 1. Since 
F and yy agree on Borel sets, u,(B) = 1. Now setting ® = f '(B) we 
have u(Q) = 1 and f(Q) = B, a Borel set. The perfectness of (Q, F, u) then 
follows by the Lemma. Since (2, §) is a D-space and every @-measurable f is 
a-fortiori $-measurable, (Q, @) is a D-space. The perfectness of (Q, @, «) for every 
u follows on replacing § by @ in the above proof. 

The converse of Theorem 1 is given by 

THeoremM 2. Let (Q, §) be a measurable space with the following property: 

(I) If @ is any separable sub o-field of S-sets, the probability space (Q, @, uw) is 
perfect for every probability u defined over G. 

Then, the range of every S-measurable function f is a D-set. 

Proor or THEOREM 2. Let @ be a separable sub o-field of §. Then there exists 
an $-measurable function f such that @ is the minimal o-field with respect to 
which f is measurable. In other words, @ is the o-field of sets f (EZ) where FE is 
a Borel set. Let » be a Lebesgue Stieltjes probability measure and §, the o-field 
of v-measurable sets. If EF is any subset of the real line it is known that there 
exists a Borel set F such that F > E and such that, for every Borel 
set B Cc F — E, we have »(B) = 0. We also have »*(£) = »(F), v* being outer 
v-measure ([5], pp. 50-51). Such a set F we shall call a »-cover of E. Let Ry = f(Q), 
the range of f. Denote by K, the v-cover of R; . If »*(R;) = 0, it is a well-known 
fact that R, ¢ 8, . If v*(R;) > 0 we also have »(K,) > 0 and we may now define 





NOTE ON PERFECT PROBABILITY 171 


a probability » on @ as follows: If A ¢ @ then A = f (EB) for some Borel set 
E. Define (1)u(A) = »(EN K;)/»(K;). First we show that (1) defines » uniquely. 
Suppose FZ; and E; are two Borel sets such that A = f(E:) = f'(E:). Then 
clearly the Borel sets E, — (£,N E.) and E, — (£,/ E;) are contained in the 
complement of R;. For « = 1, 2, (FE; - A, N BE.) N Ki Cc Ky — R; 
and (E; — E,f E,)fN K, is a Borel set. Since K;, is a »-cover of R; we have 
V[(E; —_ E,nN E,) Nn K;] = 0. From this it follows that AEN K,) = V(EN K,), 
proving that «(A) is uniquely defined. Since 2 = fR, , R, being the real line, 
we have according to (1)u(Q) = 1. Thus yu is a probability defined over sets of 
@. However, it is to be remembered that y» is not defined for all sets of $. 

By the hypothesis of the theorem, (2, @, uw) is perfect. Therefore, since 
f (Ry) € @, there exists a Borel set Ko C Ry such that 


u(f Ke) = u(f Ry) = w(Q) = 1. 
But by the definition of yu, 


v(Ko N Ki) cor v(Ko) 


u(f” Ko) ea v(Ki) =. v(Ki)’ 


so that »v(Ke) = v(K;). 


Thus, there exist two Borel sets Ko and K, such that Ko C Ry C K;, and 
v(Ko) = v(K,). Hence, remembering that » is complete, we have R, « 8, . Since 
v is an arbitrary Lebesgue Stieltjes measure, the theorem is proved. 

From the two theorems proved above we obtain the following characterization 
of a D-space ($ is assumed to be separable): 

THEOREM 3. A necessary and sufficient condition for a measurable space (Q, S) 


to be a D-space is that condition (1) of Theorem 2 be satisfied. 

If f is any $-measurable function, @,; the minimal o-field with respect to which 
f is measurable is known to be separable. Hence, Theorem 3 can also be given 
the following form: 

Turorem 3’. Let (Q, F) be a measurable space, f any univalent, real valued 
$-measurable function and Gy, the o-field defined as above. Then, a necessary and 
sufficient condition in order that (Q, @; , u) be perfect for all probability measures 
wu ts that (Q, F) be a D-space. 

Recently Blackwell has defined a Lusin space to be any (Q, $) with F, a sepa- 
rable o-field and with the property that the range of every real valued $-meas- 
urable f is an analytic set. Since analytic sets in metric spaces are Lebesgue 
Stieltjes measurable for every Lebesgue Stieltjes measure, it follows that a 
Lusin space is also a D-space. Whether, in reality, the concept of a D-space is 
more general than that of a Lusin space, we do not know. We have not succeeded 
in demonstrating the existence of a (Q, $) and a real-valued $-measurable func- 
tion whose range is a D-set other than an analytic or a Borel set. As far as we 
are able to determine, very little seems to be known about the properties of 
D-sets beyond the fact that a set S on the real line is a D-set if and only if every 
homeomorphic image of S situated on the real line is Lebesgue measurable [6}. 
Nevertheless, the introduction of the notion of D-space is justified by the fact 





172 GOPINATH KALLIANPUR 


that we are able to prove a characterizing property of such spaces given by 
Theorem 3, whereas we are unable to prove a similar result for Lusin spaces. 
In fact, Blackwell has posed the following unsolved problem for Lusin spaces: 
If (Q, §), with F separable, is such that (Q, F, u) is perfect for every probability 
uw defined on &, is (Q, ¥) a Lusin space? Theorem 2 proves a somewhat weaker 
property for D-spaces. Condition (I) of Theorem 2 is more stringent than the 
restriction that (Q, $, u) be perfect for every probability » on §. If the latter 
is given it is, of course, true that (Q, @, u) is perfect, @ being any sub o-field 
of § and uw being regarded as the contraction on @ of the probability » already 
defined on $. Condition (1) goes beyond this in requiring the perfectness of 
(Q, @, uw), (@ an arbitrary, separable sub o-field of $) for all probabilities 1 on @ 
and not merely for those « which are contractions of probabilities defined over 
the larger o-field $. 


REFERENCES 


{1] B. V. GnepEnKo anp A. N. Kotmocorov, Limit Distributions for Sums of Independent 
Random Variables, translated by K. L. Chung with an appendix by J. L. Doob, 
Cambridge, Addison-Wesley, 1954. 

{2} P. R. Haumos anp J. v. Neumann, “Operator methods in classical mechanics II,”’ 
Ann. of Math., Vol. 43 (1942), pp. 332-50. 

[3] J. L. Doos, Stochastic Processes, New York, John Wiley and Sons, 1953. 

[4] Davin BuackwE LL, “On a class of probability spaces,’ Third Berkeley Symposium, 
Vol. 2 (1956), pp. 1-6. 

(5] Paut R. Hatmos, Measure Theory, New York, D. Van Nostrand, 1950. 

(6] E. Marczewsk1, ‘“Remarque sur la mesurabilite absolue’’ (Abstract), Colloquium 
Mathematicum, Vol. 1 (1948), pp. 42-43. 





ON THE DISTRIBUTION OF THE KOLMOGOROV-SMIRNOV 
D-STATISTIC 


By Pepro Eeypio pg OLIvVErRA CARVALHO! 


University of Sio Paulo 
Summary. Gnedenko and Korolyuk [1] have pointed out that the exact dis- 
tribution of the Kolmogorov-Smirnov D-statistic can be obtained explicitly by 
solving a certain double-boundary random walk problem, which, in turn, is 
solved by the principle of reflection. This principle is employed here in what is 
believed to be a new way to derive Gnedenko’s and Korolyuk’s result. 


A random walk problem. Let us consider a random walk on the half plane 
(t > 0, s), starting from the origin, such that at every point (t, s) there are two 
possible steps to take, either to (¢ + 1, s + 1) or to (¢+ 1, s — 1), each with 
equal probability and for some positive integer n, consider the paths from the 
origin to the point (2n, 0). Among these, let us denote for any non-negative 
integer k <n, the set of all paths that have a point on the line s = k 
by C(d, 2 k), the set of all those paths that reach the line s = —k by C(d, 2 k), 
the set of all those that have a point on at least one of these two lines by C(d = k), 
and the set of all those that reach both s = ak ands = —k, but go to the s = ak 
line first, by C(d, 2 ak -+d, 2 k). Let the number of elements in C( ) be 
Ca¥ 

While it is well known (p. 70, 2) that 


(1) C*(d, >) = Few: + 


C*(d = k) is more difficult to calculate. 
Clearly, C*(d = k) = C*(d, = k) + C*(d, = k) less the number of paths in 
C(d, = k) N C(d, = k) or 


C*(d = k) = C*(d, 2 k) + C*(d, > k) — C*¥(dg 2 kd, 2 k) 
— C*(d, => kd, = k) 


2) 


and by symmetry 
(3) C*(d = k) = 2C*(d, = k) — 2C*(d, > kd, 2 k). 


Because of (1) it remains to calculate the last term in (3). As the first step, we 
show that for i = 2,3, --- , [n/k], 


(4) C*(d, = ik) — C*(d, = ik d, = k) = C*(d, = (¢ — Ik d, 2 k). 
Received January 20, 1958; revised June 12, 1958. 
1 Posthumous note. Revisions were made and references to literature supplemented 
following the referee’s suggestions by Agnes Berger and Ruth Gold, School of Public Health 
and Administrative Medicine, Columbia University. 


173 








174 PEDRO EGYDIO DE OLIVEIRA CARVALHO 


A path counted on the left side of (4) is one of two types. The first type reaches 
s = —ik, but does not reach s = k, the second reaches both lines but reaches 
s = k first. 

Let P be a path of the first type. By definition, it has points on s = —tk. 
Let p be the first of these. There must be points of P on s = —(i — 1)k to the 
left and also to the right of p; let the closest one to the left be p:: and to the 
right, pir. Replace the portion of p from py: to pir by its reflection about s = 
—(i — 1)k. The new path P’ contains the image of p, say pz: , falling on the line 
s = —(i — 2)k. Ons = —(i — 2)k, let the points of P’ nearest to pi, be pz: on 
the left and pe, on the right. Reflect P’ between p2; and Par about s = —(i — 2)k, 
to get a new path P”. On s = —(i — 3)k, let the points of P” nearest to pe, be 
pi: on the left and p;, on the right. Continuing in this manner, i reflections 
will lead to a path P™ that goes first to s = —(i — 1)k and then to s = k, 
but does not reach s = —ik except possibly after reaching s = k. Thus 


P® ¢C(d, = (i — Ik d, = k) 


but 
P® ¢C(d, 2 ik d, 2 k). 
Conversely, let Q be any path such that 
Qe C(d, = (¢ — 1)k— d, = k) 
but 
Q zC(d, = ik—d, 2 k). 


Q has points on s = k, let q be the first of these. On s = 0, let g; and g, be the 
nearest points of Q to the left and right of g, respectively. Let all the other 
points of Q on s = 0 to the right of g, be a; , --- , am, in order. Let us reflect the 
portions of Q between q; and g, and at the same time between all those points a; , 
4:4; between which Q reaches s = k about the line s = 0. The new path Q’ does 
not reach k. The reflection of q, say q’, lies on s = —k. Next reflect Q’ between 
q' and the nearest point to the left of it on s = —k. Continuing in the same man- 
ner, the ith reflection will produce a path that reaches s = —ik but never 
reaches s = k. 

Let U be a path of the second type, i.e., one that reaches s = k first and then 
reaches s = —ik. Let p = (t, 0) be the first return of U to s = 0 after having 
reached s = —ik. Let the portion of U between (0, 0) and (t, 0) be represented 


by the ordered sequence « , @, -:- , €, where ¢; is a vector of length +/2 and 
slope +1 or —1. Let U’ be a path such that from (0, 0) to (t, 0) it is given by 
the reversed sequence ¢; , €:-1 , «+ , «: and coincides with U from (t, 0) to (2n, 0). 


U’ is clearly in C(d, = ik — d, = k) and therefore also in C(d, 2 (i — 1)k—> 
d, = k). 


Conversely, let V be a path such that 
V e C(d, = (i — 1)k—d, 2 k) 





KOLMOGOROV-SMIRNOV D-STATISTIC 


and 
V e Cid, = ik-—d, = k) 


and let go = (t’, 0) be the first return of V to s = 0 after having reached s = k. 
Reversing the steps between (0, 0) and go uniquely determines a path of type II, 
completing the proof of (4). 

Note that (4) has the structure 


A; — By = Bi 


where A; = C*(d, = ik) is known by (1). Thus knowing B; for any 7 implies 
knowing all B, for j < ¢. But for i = [n/k], (4) gives 


(.+ fs) -0= ot (a, 2 ((2] - 1)k—a, 2 t) = 1 = Bryej-t. 


Carrying out the substitutions gives 


. i {[n/k] Qn ; 
Cd, 2k zh = % one (1) 


and from (3) 


*(7 > > 2n i+1 
C*(d =k) =2 (era) (BD 

Application to the Kolmogorov-Smirnov problem. Let X = (1 < 12 < «++ < 2) 
and Y = (y; < yz < --: < yn) be two independent samples of ordered inde- 
pendent observations having the same continuous cumulative distribution 
function. Suppose z; ~ y;, (i, 7 = 1, 2,---,m) and let the two samples be 
combined and arranged in increasing order of magnitude, say Z = 
(zy < 2 < +++ < Zen). Let S,(x) be the number of observed values x; which are 
less than or equal to x and S,(zx) the number of observed y,’s less than or equal 
to 2. 

Let 


t=1 


D* = max (S,(z) — S,(z)) 


D = max |8,(x) — S,(z) |. 


The limiting distribution of D was found by Kolmogorov [38, 4] and Smirnov 
[5, 6, see also 7] and an iterative method for its exact distribution has been 
given by Massey [8]. Gnedenko and Korolyuk recognized that a one to one 
correspondence exists between the set of all Z and all paths from (0, 0) to (2n, 0) 
in the above discussed random walk: Starting from (0, 0), we move to (1, 1) if 
z, is from Y, to (1, —1), if z, is from X and so on. In particular, samples Z for 





176 PEDRO EGYDIO DE OLIVEIRA CARVALHO 


which D 2 k correspond to paths in C(d 2 k) and vice versa. Thus we get 
Gnedenko’s and Korolyuk’s result 


7 
PiD<k}=1- C*@ 2 k) 
2n 
REFERENCES 


{1} B. V. GnepEenKo anp V. 8. Korotrux, ‘‘On the maximum discrepancy between two 
empirical distributions”. Doklady Akad. Nauk. SSSR (N.S.), Vol. 80 (1951), pp. 
525-528. Reviewed by W. Feller in Mathematical Reviews, Vol. 13 (1952), pp. 570- 
571. 

[2] W. Fever, An Introduction to Probability Theory and its Applications, Vol. 1, John 
Wiley and Sons, Inc., New York, 1957. 

[3] A. Kotmoaorov, ‘Sulla determinazione empirica di una legge di distribuzione’’, Inst. 
Ital. Attuari, Giorn., Vol. 4 (1933), pp. 1-11. 

[4] A. Kotmocorov, ‘Uber die Grenzwertsitze der Wahrscheinlichkeitsrechnung”’, Bul- 
letin [Izvestija] Academie des Sciences URSS, (1933), pp. 363-372. 

[5) N. Smirnov, “Ob uklonenijah empiriéeskoi krivoi raspredelenija’’, Recueil Math- 
ematique (Matematiteskii Sbornik), N.S. Vol. 6 (48) (1939), pp. 3-26. 

[6] N. Smirnov, ‘‘On the estimation of the discrepancy between empirical curves of distri- 
bution for two independent samples’’, Bulletin Mathématique de l’Université de 
Moscou, Vol. 2 (1939), fase. 2. 

[7] W. Feuusr, ‘‘On the Kolmogorov-Smirnov limit theorems for empirical distributions’’, 
Ann. Math. Stat., Vol. XIX, No. 2 (1948), pp. 177-189. 

[8] F. Massey, Jr., ‘The distribution of the maximum deviation between two sample 
cumulative step functions’’, Ann. Math. Stat., Vol. 22, No. 1 (1951), pp. 125-131. 





EQUALITY OF MORE THAN TWO VARIANCES AND OF MORE 
THAN TWO DISPERSION MATRICES AGAINST 
CERTAIN ALTERNATIVES' 


By R. GNANADESIKAN? 


University of North Carolina 


0. Introduction and summary. In this paper, using the heuristic union-inter- 
section principle [4], two tests are proposed, and the associated simultaneous 
confidence bounds on parametric functions which are measures of a certain type 
of departure from the respective null hypotheses are obtained. The first test is 
for the equality of (k + 1) variances (k 2 2) of (k + 1) univariate normal 
populations, wherein we choose one of the variances as a standard (of course, 
unknown), and compare the other k variances with it. The alternative to the 
hypothesis is that not all the k variances are equal to the standard one. The 
proposed test may be called the simultaneous variance ratios test. The well- 
known Hartley’s Fnax test [2] for the case of equal sample sizes is not equivalent 
to the present test even when all samples are of the same size since the alterna- 
tives in the two cases are different. In the alternative in Hartley’s test, aside 
from the inequality of the k variances to the standard one, the mutual inequality 
of the k variances also plays an important role. The second test proposed in this 
paper, is a multivariate extension of the first. This paper also considers the 
distribution problems that arise in connection with both the tests. The non- 
availability of tables at the moment makes the immediate practical application 
of the tests and the associated confidence bounds not possible. 

Sections 1, 2, and 3 deal with the univariate problem and Sections 4 and 5 
deal with the multivariate extensions. 


1. The simultaneous variance ratios test. For (k + 1) univariate normal 
populations we want to test the composite hypothesis Hy :oj = 0; = --- = 
oi, = oo . Suppose we have independent random samples of sizes (n; + 1), i = 0, 
1, 2, --- , k, from the (k + 1) populations, and let 8; be the estimate of o? based 
on n; degrees of freedom for i = 0, 1, --- , k. Let us choose o} as standard and 
compare o1,-°**, 0% With oj, so that Hp is equivalent to Ho :oi/oe8 = --- = 
oi/o3 = 1. The alternative hypothesis is H;: Not Ho, i.e., at least one 
o%/0% * 1. For each hypothesis like Ho; :o7/03 = 1 against Hj; :03/03 * 1, we 
have the well-known test with the acceptance region, 


(1.1) Fa S Fi(ni,n) S Fe, 


where F;(n;, mo) = 8;/8> has the central F-distribution with n; and no degrees of 


Received August 5, 1957; revised July 29, 1958. 

1 This research was sponsored by the United States Air Force through the Office of the 
Air Research and Development Command. 

2? Now with the Procter and Gamble Company, Cincinnati, Ohio. 


177 





178 R. GNANADESIKAN 


freedom under Ho;. It is also easily seen that Ho = Ni-t,...xHo; and Hy = 
U;-4,....eHi;. Therefore, by the heuristic union-intersection principle [4], we 
shall take for our test of Ho, i.e., of Ho, the acceptance region, 


(1.2) Fy S Fi(m,n) S Fe, Pu S Fo(me, 0) S Fa, ---, 


Fu S Film, mm) S Fr, 


which is the intersection (over 7) of the regions (1.1). For the critical region, 
therefore, we take the union (over 7) of the complements of the regions (1.1). 

The optimum choice of Fa , Fi2, for i = 1, 2, --- , k, is not known, and, in 
the absence of this knowledge, the following choice is suggested as one possible 
way: 

Following the usual procedure for obtaining a Type I union-intersection 
region, choose F and F2 such that all the individual regions (1.1) will have the 
same size (1 — a*), where a* is such that the size of the intersection (1.2) is 
(1 — a), for a preassigned a. In general, of course, (1 — a) ¥ (1 — a*)*, but 
assuming non-triviality, given a we can determine a* and vice versa. This con- 
dition, however, still does not determine the region (1.2) completely. In order to 
do so, we impose the further condition that, for each 7, the test with acceptance 
region (1.1) be locally unbiased (in the sense of Neyman). This latter condition, 
as will be shown in Section 2, ensures a desirable property of the simultaneous 
variance ratios test with acceptance region (1.2). 

Fn; , mo), for i = 1, 2, --- , k, are quasi-independent variance ratios in the 
sense of [3], i.e., the numerators, except for constant multipliers, are distributed 
independently as x variates and are so distributed independently of their com- 
mon denominator which also, except for a constant multiplier, is distributed as a 
x’ variate. The joint distribution of such quasi-independent variance ratios is 
given in [3] and using an approach which is essentially the same as that con- 
tained in that paper we can obtain a recurrence relation to aid us in evaluating 
the probability integral associated with the Simultaneous Variance Ratios test 
(1.2). It must be noted, however, that the recurrence relation solves the problem 
only in theory and for practical purposes tables of the probability integral need 
to be constructed. 


2. Properties of the power of the test proposed in Section 1. We shall first 
note that the power, or, equivalently, the probability of the second kind of error, 
8, of the test could involve as parameters only the k ratios 6; = o;/oo, (i = 
1, 3, <*> , 


8 = PiFu S Fi(m, m) S Fe, --: , Pin S Fe(m, mo) S Fix | Hil 


a p| Fs < Film, nN) < Pia aa Pa < Film, No) < ela, 
bi 6 61 bs bs bs | 
where for i = 1, 2,--- ,k, Fi(n,,mo)/6; has a central F-distribution with 


degrees of freedom n; and mp and the different F’s are quasi-independent. It now 
follows that 8 could involve as parameters only 6, , 52, --- , &. 





EQUALITY OF DISPERSION MATRICES 179 


We shall next show that, for the choice of F, , Fi mentioned in Section 1, the 
power of the test has the monotonicity property, i.e., that as each 6; , (¢ = 1, 
2,--:+,k), tends away from unity the power monotonically increases. It is to be 


noted that Ramachandran [3] has proved a similar property of the simultaneous 
analysis of variance test. 


Let us write v; = s,°/oi,i = 0, 1, 2,--- , k, and let p(v,) denote the prob- 
ability density function of x* with n; degrees of freedom. Then we have 


~ BB: -[ P(r) | 1 +e p(vi) av, dug 


asst 


-[ p(v) dv = “1 Sra 0 p(v;) av: |, 


this being valid, 


Fisvo 


-[ p(v») (2 Sra p(v1) av.) x I iS * alo) av, | 
{- p(v) (G)" ay (8) j LF an)? 2 Yaar 


s 


s “a Il Be i pv.) _ dvo 


const. | eee) dns, 


— *1F 11% mi a 


®2 83 
f(%) = nt [Fu Th” — Fyte 
k 


Fi2ro 
FI yoo 
and the constant factor is non-negative. Noticing that v ,5;, Fa > Fa, (¢ = 1, 
2, -+- , k), are all essentially positive and that each of the integrals in the product 
{x2 *** i8 positive (lying between 0 and 1), we may apply a well-known result 
in Calculus to obtain, 08/04, 2 0 according as 


M1 _1F 11% m1F i12%0 
Py? e 26, 


ie., according as 


(2.2) (Fu) < 





180 R. GNANADESIKAN 


It can be shown that the condition of local unbiasedness of the region, 
Fu S Film ,m) S Fu, reduces to, 


yr Mi erie P ir) 
(2.3) ( pape e : 

Substituting in (2.2), we see, after some simplification, that 
(2.4) 08/88, 2 0 according as 5; $ 1 (irrespective of vp) > 0). 


Hence, the power is a monotone increasing or decreasing function of 4; ac- 
cording as 6, 2 1. The same property with respect to 5, --- , 5 can be proved 
similarly. 

Also, if 5’(1 X k) = (61, 62, --+ , 5%), then from (2,1) and (2.3) it appears 
that if Px , PF are chosen so as to make the region (1.1) locally unbiased, then 


0B 
2.5 = = 0. 
(2.5) 06; \s=a,---,») 
Therefore, the proposed test is locally unbiased, and, as a consequence of its 
monotonicity property, it will be completely unbiased. 


3. The associated simultaneous confidence bounds on o;/o; , (i = 1,2, --- ,k). 
Under the alternative hypothesis it is known that Fi(n;,m)/5;, for « = 1, 
2,---,k, are distributed as quasi-independent variance ratios. Hence, we can 
make the following simultaneous statements: 


(3.1) Pa 5 9s ra, Pag AO og Fy, 
1 k 
where Fy , Fi (¢ = 1, 2, --- , k) are such that 
(3.2) P{Fu S Fi(m,no) S Fe,-+: , Pa S Film, no) S Fe] = (1 — a), 


so that the probability associated with (3.1) is (1 — a). 
By inverting the statements (3.1), it is easily seen that we obtain the following 
simultaneous confidence interval statements, 





2 
‘ 85 85 . 
(3.3) dF, = * 5 aR,’ +=1,2,---,k, 
with a joint confidence coefficient (1 — a). 

These results are valid for all choices of Fa , Fig, (¢ = 1, 2, --- , k), satisfying 
(3.2). However, if Fa , Fi, (¢ = 1, 2, --- , k), are chosen as in Section 1, then, 
from the unbiasedness and monotonicity properties of the associated test, 
(proved in Section 2), we shall have the desirable property of monotonically 
increasing shortness (in terms of probability of covering wrong values) for the 
confidence bounds (3.3). 


4. The multivariate test. The notation used in this and the following section- 
is fairly standard and, for example, is the same as that used in [6], [7], [8]. The 





EQUALITY OF DISPERSION MATRICES 181 


case k = 1 is treated in these papers but now we shall proceed to consider the 
case k = 1, and, as will be seen, the results for the case k 2 1 are similar in 
form to those for the case k = 1. 

In the multivariate situation we need a test for the hypothesis of equality of 
the dispersion matrices of (k + 1) non-singular p-variate normal populations, 
N{§;, Zi, 7 = 0, 1, 2, --- , k. That is, the null hypothesis is Hy :2, = 2, = --- 
= & = Xo. Suppose that X,[p XK (ni + 1)], i = 0, 1, --- , k, where p S n,; for 
t = 0,1, --- , k, are mutually independent random samples respectively from the 
(k + 1) normal populations. Let 


(4.1) niSidp X p) = XiXi — (n+ DEE, 1 =0,1,2,---,k, 


where Z;(p X 1)’s are the sample mean vectors and So, S,, --- , S, are sample 
dispersion matrices estimating 2», 2;,--- , 2» respectively. The sample dis- 
persion matrices have independent Wishart distributions with S; having the 
distribution, 


=e , = 
(4.2) p(S,) dS; a | 2; = exp | - tr 27's; | S; | 2 ds; > 


fori = 0,1,2,--- ,k. 

Just as in the univariate case discussed in Section 1, we may, for the multi- 
variate case, choose 2» as standard and compare the k matrices 2, , --- , 2, with 
Xo. Notice that Sp, S,,---, Se are symmetric and almost everywhere (i.e., 
except on a set of probability measure zero) positive definite, and Zp, 21, --- , 2 
are symmetric positive definite matrices being dispersion matrices of non- 
singular p-variate normal distributions. 

Consider, in analogy with (1.2), the test for Hp :2; = --- = 2, = Yo, whose 
acceptance region is 


Cmin(S;) Cmax(S;) . 2 
. —— < = eee a 
(4.3) Ay s c ax(So) = c in( So) = Ajo ’ J l, ’ ’ k 


Here also the optimum choice of Aj , Aj , for7 = 1, 2, --- , k, isnot known. We 
shall, however, consider a choice in analogy with our choice, discussed in Section 


1, for the univariate case. Let us choose Aj , 2, forj = 1, 2, --- , k; so that all 
the individual regions, 


er Cmin(S;) Cmax (Sj) 
(4.4) hn ™ Cmax (So) 7 Cmin(So) T 


Ss Aje, 


are of the same size (1 — a*), where a* is such that the region of intersection, 
(4.3), is of size (1 — a). Here again, in general, (1 — a) ¥ (1 — a*)*, but we 
shall assume non-triviality, i.e., given a we can find a* and vice-versa. As a 
further condition to determine the \’s completely, let us impose the condition 
that the individual tests with acceptance regions (4.4) are to be locally unbiased. 

Investigations, similar to those of Section 2, for desirable power properties, 





182 R. GNANADESIKAN 


which might follow from the second condition on the \’s, have not been made in 
this inquiry due to the difficulties involved. 

A method of evaluating the individual probability integrals like (4.4) is given 
by the author in [1]. The probability integral associated with (4.3), however, 
needs to be studied with a view to tabulation. 


5. The associated simultaneous confidence bounds on c(2,20') for i = 1, 
2,--- ,k. Since Sp, S,, --- , S, are independently distributed, the joint distribu- 
tion of the S’s is obtained by taking the product of the distributions in (4.2). 

Next let us make the following transformations, 


(5.1) 2p X p) = Ai(p X p) D,,(p X p) Ad(p X P), t= 0,1,2,---,k 
where each A; is an orthogonal matrix, and the p diagonal elements of each D,, 
are the p characteristic roots, ya, --- , Yip, of the corresponding =; (for 7 = 0, 
1,2,---,k). 


Then the joint distribution of S, , --- , S,; and So may be rewritten as 


ny—p-—l 


k p —n; k k 
const. IT (II vii? ') exp | -3 tr ‘z ni A; D,,, 4.8,}] II | S; | : dS; ’ 
i= \j=t =) é i=0 


or, remembering that tr [A(p X q)B(q X p)] = tr [B(q X p)A(p X Q)], (eg. [5), 
p. A-1), as 


n;—p-—l 


k k 
(5.2) const. exp | -3 tr ‘z mi Di, A: SA; Ds, | I |S:| *  dS;. 





Let us next make the transformations 


(5.3) Dry = Ai Sis Dy = = Si, i=0,1,---,k. 
Then the joint distribution of S! , --- ,S; and So is seen to be 

(5.4) const. exp | -3 {2 ne St) | I |S? a dS;, 

which is of the same form as the joint distribution of S, , --- , S, and So under 

Taadiened it follows that we can find constants Xj , Az (j = 1, 2, --- , k) such 


that the simultaneous statements 


Cmin(S; ) < Cmax(S;) < . 

—— — = — —— ee f = 1 2 “ee 
Cmax( Se) = Cmin( Se) = Aje, or j » “> , k, 
have a joint probability = (1 — a), for a preassigned a. It is well known that all 
non-zero c(AB) = the non-zero c(BA) (e.g. [5], p. 138). Hence, c(A;S;A;) = 
c(S,), since A, is orthogonal. Furthermore, 


c(S?) = c(Dy,,,A:SiAj), where Si(p X p), 


(5.5) la S 


for i = 0, 1,--- , k, are symmetric and almost everywhere positive definite 
matrices. 





EQUALITY OF DISPERSION MATRICES 
Consider, for any j = 1,2, --- ,k, 


(5.6) ha < Cmin(S; ) a% max(S; ) 


Cmax(So ) CmintS8) > Nm 


which is equivalent to 


1. Cmss(Sj ) . ConlS;) 1 
— = —sen 2 =—_— 
Aa a Cmin(So*) = Cmax(Se) 1) Ap’ 


or to 
(5.7) = > Cmax(D,, A; S;"A;) > Cmin(Dy; A; S7 tA) 
An Cmin(Dy, AoSg'Ad) ~ Cmax(D,, heSs 1A) 
It is known that if Ai(p X p) and A2(p X p) are two matrices such that A; is 
symmetric positive definite and A; is symmetric at least positive semi-definite 
then Cmax(A1) Cmax(A2) 2 c(A1A2) 2 Cmin(A1) Cmin(Az). Using this, we have 
Cmax(Dy,AjS; Aj) Cmax( A ;S;Aj) = Cmax(D,;), 80 that, 
Cmax(Dy,AjS; Aj) 2 Cmax(D;)/Cmax(AjSjAj) - Cmax(Dy;)/Cmax(Ss), 
and similarly, 
Cmin(Dy,) on Cmin(Dy,) 

Cmin(Ao So Ao) Cmin (So) . 
and hence we see that the first part of (5.7) implies that 

i > Cmox(Dy;) Cmin(So) 

Aj re Cmin(Dy,) Cmax(S;) : 
Again, using the above mentioned result, we note that 
Cmin(D, ;) 
Cmin(S;) 


Cmin(Dy, AoSo Ao) 


Cmin(Dy,A;Sj'Aj) S 


~~ ve > Cmax(Dy,) 
Cmax(Dy, Ao So Ao) = Cmax (So) ’ 


so that the second part of (5.7) implies that, 
Cmin(D,;) Cmax(So) 


Cmax(Dy,) Cmin (Sj pot 
Therefore, we observe that (5.7) implies the statement, 


(5 8) = Cmax(S;) s, Cmax(Dy;) Cmin(D,;) . 1 Cmin (S i) 
A Aj Caine) - lf 8 1 ” Cua ne) ot hj Rinaat So) ° 
We also have 


Cuna(D. i) me \ as > —l, ~ > wl 
say eee Cmax(Dy;)Cmax(D1,,,) ry Cmax(Zj)Cmax(Zo ) = Cmax(Z; Zo i 





184 R. GNANADESIKAN 
and, 


‘a a(D, ) ” ~ 
eons = Cmin(Dy;)Cmin(D1,,,) = Cmin(Zj)Cmin (Zo ) s Cmin(2j Zo ). 


Therefore, we observe that (5.8) implies the confidence statement, 


1 Canx(S;) 1 -1 1 Cmin(S;) 
(5.9 — eT S Cmax(ZjZ0 ) S Cmin(ZsBo) SF — —— ax: 
Ann Cmin(So) ~ Cnax(Zj20 ) Z Cmin(Zj2o ) Z Aja Cmax(So) 
Combining all statements like (5.9) for 7 = 1, 2,---,k, we see that the 
statements (5.5) imply the simultaneous confidence statements, 


1 Cmax(Sy) —1 1 ¢min(S1) 
cen et nn toe 
(5.10) : : : 
i: Cmin (Sx) 
Aw Cmax (So) 








1 Cmax(Sx) ~ —1 
— — = allc(=,2 ) = 
Ant Cmin(So) ~— all e(2, 20) 2 


with a joint confidence coefficient = (1 — a). 


6. Concluding remarks. Hartley’s test is more involved but has a more de- 
tailed structure of alternatives. A generalization of Hartley’s test to the case 
of unequal sample sizes, and a multivariate extension of that, are under investiga- 
tion and will be discussed in a later paper. 


7. Acknowledgement. The author wishes to express his indebtedness to Prof. 
§. N. Roy for his help and guidance in preparing this paper. 


REFERENCES 


[1] R. GNANADESIKAN, ‘‘Further contributions to multivariate analysis including univariate 
and multivariate variance components analysis and factor analysis,’ Institute 
of Statistics, University of North Carolina, Mimeo. Series No. 158, 1956. 

(2) H. O. Harter, “The maximum F-ratio as a short cut test for heterogeneity of var- 
iance,’’ Biometrika, Vol. 37 (1950), pp. 308-312. 

[3] K. V. RamacHanpraNn, “On the simultaneous analysis of variance test,’? Ann. Math. 
Stat., Vol. 27 (1956), pp. 521-528. 

[4] S. N. Roy, “On a heuristic method of test construction and its uses in multivariate 
analysis,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 220-238. 

[5] S. N. Roy, Some Aspects of Multivariate Analysis, John Wiley and Sons, New York, 1957. 

[6] S. N. Roy, ‘‘Some further results in simultaneous confidence interval estimation,’’ Ann. 
Math. Stat., Vol. 25 (1954), pp. 752-761. 

[7] 8S. N. Roy, ‘‘A note on ‘Some further results in simultaneous confidence interval estima- 
tion,’ Ann. Math. Stat., Vol. 27 (1956), pp. 856-858. 

[8] S. N. Roy anp R. GnanapestKaN, “Further contributions to multivariate confidence 
bounds,’’ Biometrika, Vol. 44 (1957), pp. 399-410. 





ON THE THEORY OF BAN ESTIMATES' 


By Rospert A. WI1JsMAN? 
University of California, Berkeley 


1. Introduction and summary. The notion of best asymptotically normal 
estimates—BAN estimates for short’—was introduced by Neyman {8} in the 
multinomial case. Applications have been made in biological problems, notably 
in bio-assay [2], [4], [5]. Generalizations of Neyman’s work have been made by 
Barankin and Gurland [1], Chiang [3] and Ferguson [5]. The usual theory of 
BAN estimates requires differentiability of the estimates, and imposes rather 
strong conditions on certain functions given in advance (the functions ¢ and = 
of Section 3). In this note a different definition of BAN estimates is made which 
does not require differentiability, at the same time relaxing the conditions on 
¢ and 2, whereas in essence all important theorems in the theory of BAN esti- 
mates are retained. 


2. Notation. Convergence in probability is denoted by a X, ~ Y, means 


Xn —- Yn - 0. X,, = 92(0, =) means that the law of X, tends toa multivariate 
normal law with mean 0 and covariance matrix 2. A’ is the transpose of a matrix 
A, Im them X m identity matrix. We shall write regular (1), regular (2), BAN (1), 
BAN (2), depending on whether Definition 1 or Definition 2 of Section 3 is used. 
For notation and terminology not explained here see Chiang’s paper [3], with 
which the notation in this note is in fair agreement. 


3. The usual definition and a new definition of BAN estimates. Let Z,, be a 
sequence of random vectors, taking values in a space Z which is a subspace of 
a k-dimensional Euclidean space R*. The distribution of the Z, depends on a 
parameter @ which takes values in an open subset 2 of an m-dimensional 
Euclidean space, where m S k. The true value of @ will be denoted by @ . It is 
assumed that 


(1) Vn (Zu — $(60)) 2+ (0, 2(6,)) 


in which ¢ and = are functions on Q, ¢ into Z and & into the space of k X k 
positive semi-definite matrices. Let the set ¢(2) be denoted by U. In the simplest 
theory of BAN estimates, an admissible estimate is a function 6 from Z to Q, 
and if Z, is observed then @ is estimated by 6(Z,). We shall occasionally write 


Received November 11, 1957; revised November 7, 1958. 

1 This investigation was supported (in part) by a research grant (No. G-3666) from the 
National Institutes of Health, Public Health Service. 

2 Now at the University of Illinois. 

* Some authors prefer “RBAN,” where ‘“‘R” stands for regular. In this note, regularity 
of a BAN estimate is part of the definition. 


185 





186 ROBERT A. WIJSMAN 


6, instead of 6(Z,). In the usual theory of BAN estimates the following assump- 
tions and definitions are made (our Definition 1 differs slightly from the one 
given in [3] but leads to the same definition of BAN (1)): 

ASSUMPTION 1. (i) ¢ is 1-1 and bicontinuous; (ii) 2(@) is nonsingular for every 
6; (iii) ¢ and Z have continuous second derivatives; (iv) the matrix 0¢/00 is of 
rank m for every @. 


DeFIniTIon 1. 6 is called regular (1) if (i) 6(Z,) - 6) whatever 6 , that is, 6 
is consistent; (ii) 86/dz exists and is continuous in a neighborhood of U. 

We shall denote £(4) = fo, 2(@) = 2X». The k X m matrix derivative d¢/30 
will be denoted by V(@), the m X k matrix derivative 06/dz by A(z). V(@0) and 
A(fo) will be written Vo, Ao respectively. An immediate consequence of As- 
sumption 1, Definition 1 and (1) is that 


(2) 6(¢(@)) = 0 identically in @. 


This enables us to show that 6, is asymptotically normal, with asymptotic co- 
variance matrix A» 2» Ao. From (2) one obtains by differentiation 


(3) A({(@))V(@) = I identically in 8, 


which is a very important relation since it has as a consequence that among all 
matrices of the form Ao2oA¢ there is a minimal one. A regular (1) estimate with 
minimal covariance matrix is called BAN (1). 

A priori there is no reason why an estimate should have a derivative, apart 
from the convenience with which one can show the existence of a minimum co- 
variance matrix among the covariance matrices of regular (1) estimates. The 
differentiability conditions in Assumption 1(iii) also seem stronger than they 
need be. The reason for assuming so much differentiability is to ensure the dif- 
ferentiability of 6 if the latter is generated by minimizing a quadratic form [3] 
or by the root of a linear form [5]. Chiang [3] in his theorem 5 makes the full 
Assumption 1(iii). In his theorem 6 he only assumes that ¢ has continuous second 
derivatives, and makes no assumptions on 2. However, the type of estimates 
allowed there goes out beyond the simplest theory of BAN estimates as treated 
in the present paper, since 6 is allowed to be a function of both Z, and S, , where 
S, is a consistent estimate of 2. If S, is taken as a function of Z, , which is 
usually the case in applications, it has to be a differentiable function in order 


that 6 is differentiable. Then, since Z, 7: fo, we must have S,({o) = Zo, what- 
ever be 6 ; that is, S,(¢(@)) = 2(@), so that = has continuous first derivatives. 
If a regular (1) estimate is to be obtained as a root of a linear form (equation (5) 
in the present paper) then the matrix B in this linear form has to have continuous 
first partial derivatives with respect to z and @. If, in addition, this regular (1) 
estimate is to have minimum covariance matrix, then the matrix B has to 
satisfy B(¢(@), 8) = V’(@)=~'(@) for all @ (the more general condition given in 
[5] reduces to the one given here if = is non-singular). It follows then 
that V’(@)="'(@) has to have continuous first derivatives with respect to 6, and 





BAN ESTIMATES 187 


the most natural way to achieve this is to have both V and = continuously 
differentiable. Thus, we see that in order to generate BAN (1) estimates it is 
practically necessary to require that = is continuously differentiable and ¢ con- 
tinuously twice differentiable. 

It should be said at once that Ferguson in [5] does not require the estimates 
to be regular (1), which allows at the same time for relaxation of the conditions 
on = and ¢. The class of estimates considered in [5] consists of all those esti- 
mates which can be obtained as a suitably chosen root of a linear form (our 
equation (5)), for the various choices of the matrix B. The restrictions on B— 
continuity in z and differentiability in @—ensure the continuity of 6 in a neigh- 
borhood of U. The corresponding class of covariance matrices contains the 
minimum member (VoZo' Vo), which can be attained if 2 and V are continuous 
in 6 (our Assumption 2(iii)). This minimal covariance matrix has the same form 
as in the case of regular (1) estimates, so that Ferguson’s BAN estimates have 
the same asymptotic properties as BAN (1) estimates whenever they exist. 

It may be argued that from a theoretical point of view it is slightly unsatis- 
factory to define a class of estimates by the manner in which they are generated, 
rather than by their common properties. The approach in this paper is different 
from Ferguson’s in that a class of estimates will be defined by the properties 
they are to have, in the same way as this is done in the case of regular (1) esti- 
mates, but with a weaker definition of regular. The new definition of regular 
(Definition 2, below) will henceforth be referred to as regular (2). In spite of 
this weakening, all important theorems remain valid, in particular the existence 
of the minimum covariance matrix (Vo20 Vo). The Assumptions I(iii) on 2 
and ¢ will also be relaxed. The only differentiability condition retained is that 
V = 0¢/06 exists and is continuous. This condition is certainly a very natural 
one, since the matrix V plays an important role in the theory of BAN estimates. 
Thus, compared to the theory of BAN (1) estimates, the field of applicability 
is enlarged, and at the same time the theory becomes somewhat neater since 
the conditions are better tuned to the essentials in the theory. The class of 
regular (2) estimates contains both the regular (1) and Ferguson’s estimates. 
Furthermore, the asymptotic properties of BAN (2) estimates are the same as 
those of BAN (1) estimates whenever the latter exist. Thus, the class of esti- 
mates to be considered is enlarged, and if a BAN (1) estimate exists, it still 
belongs to the best estimates in this class. 

AssumpPTION 2. (i), (ii) and (iv) are the same as assumptions 1(i), 1(ii) and 
1(iv); (iii) = is continuous and ¢ has a continuous first derivative V. 

DeFIniTION 2. 6 will be called regular (2) if (i) 6 is continuous in each point 
of U; (ii) for each @ there exists an m X k matrix A(@), continuous in @, such 
that 


(4) V/n(6(Zn) — O%) ~ A(O)+/n(Zn — fo) whatever 4% . 


Definition 2 is weaker than Definition 1, in that the former is implied by the 
latter, but not vice versa. In fact, Definition 2 does not even guarantee the 





188 ROBERT A. WIJSMAN 


continuity of 6 in a neighborhood of U. This causes a little trouble, since 6 is 
then not necessarily measurable. For the reinterpretation of various proba- 
bility statements the reader is referred to a remark by LeCam ([(6], p. 132). 
With Definition 2 it is easy to show the validity of (2). The following theorem, 
however, is not immediate, and will be proved in section 4. 

Turorem 1. A(@)V(0) = I,, identically in 6. 

Theorem 1 takes the place of (3), and A(@) takes the place of what was pre- 
viously called Ay. Thus, it is seen that the classes of covariance matrices of 
regular (1) and regular (2) estimates coincide. A regular (2) estimate with mini- 
mal covariance matrix will be called BAN (2), and, with a few modifications, 
the whole theory of BAN estimates is unchanged. 

The next theorem is essentially due to Ferguson [5], except for the weaker 
conditions, and shows how to generate a BAN (2) estimate as a root of a linear 
form. 

TuroreM 2. Let B(z, 6) be an m X k matrix, continuous in 6 for each z and 
continuous in (z, @) at each point (§(8),0), such that ByoVo is nonsingular whatever 
6) , where By = Boo, 0). Then there exists a neighborhood N of U and a function 
6 on N to Q such that on N, 6(z) satisfies the equation 


(5) B(z, )(2 — (0)) = 0 
for @ and such that 6 is a regular (2) estimate. Furthermore, we have 
(6) Vn(bn — %) ~ (BoVo) Bor/n(Zn — £0). 


Lastly, if Bo = VoXo' then 6 is BAN. 

The proof of Theorem 2 proceeds along conventional lines and will not be re- 
produced here. It relies on Brouwer’s fixed point theorem [7], in which the trans- 
formation is assumed to be continuous but not necessarily differentiable, and, 
as a consequence, uniqueness of the fixed point cannot be concluded. As a result, 
the estimate 6 is not necessarily unique. The matrix B may be chosen to be 
V’(6)=""(@) and will then satisfy the conditions of Theorem 2, by assumption 
2(iii). Moreover, this choice for B will generate a BAN (2) estimate, by the last 
part of Theorem 2. The same conclusions hold if V(@) is replaced by a matrix 
V*(z), depending only on z, such that V* is continuous in each point of U, and 
V*(¢(6)) = V(@) for all @. 2(@) may be replaced similarly. 


4. Proof of Theorem 1. We shall need the following lemma, whose proof is 
due to Dr. Lucien M. LeCam. 

Lemma 1. Let (a, b) be a one-dimensional interval in R*, and suppose that for 
each x & (a, b) there is a sphere S(x) about x with radius r(x). Then there are two 
distinct points, x, and x2, in (a, b) such that x, ¢ S(x2) and 22 € S(2). 

Proor. Let x ¢ (a, b) be a point such that there is no z ¢ (a, b)NM S(xo) for 
which 2» ¢ S(x). Then r(x) — 0 as x — a. On the other hand, r(x) > 0. It 
follows that r has a discontinuity of the first kind at 2 . The conclusion follows 
from the fact that the discontinuities of the first kind are denumerable. 

We proceed to prove Theorem 1. Suppose that for some @; ¢ 2, A(6;)V(@;) # Im. 





BAN ESTIMATES 189 


Without loss of generality we may assume that {(6) is the origin 0 of the co- 
ordinate system in Z. Let T be the tangent plane to U in 0, and consider in the 
following T fixed, i.e. not subject to transformations. There is a transformation 
z—> z* = g(z) which in a Z-neighborhood of 0 is continuously differentiable in 
both directions, with dg(0)/dz = I, , and which maps U into 7. More precisely, 
there is a U-neighborhood N, of 0, a T-neighborhood N, of 0, and a Z-neighbor- 
hood N, of N,, such that on N, the transformation g has the differentiability 
properties mentioned in the preceding sentence, and such that g(N.) = N;. 
Let Ny = ¢"(N,). In (4) we shall only consider values of 6 which are in Ng. 
The estimate 6 is now to be considered as a function of z*. If we write (4) down 
for the transformed variables, i.e. replacing Z, by Z., fo by to, we have to 
replace A(0) by A*(8) = (dg(to)/dz) A(@). A*(6), like A(@), is continuous in 
6. Since dg({(6;))/dz = I, , we have A*(@,) = A(6,). Furthermore, for the trans- 
formed variables we have V*(@) = d¢*/d0, where ¢*(0) = g(¢(@)), so that 
V* = (0g/dz)V (the arguments have been suppressed). At 6, we have 
V*(6,:) = V(6,). Thus, if A(@,)V(6,) # In, then also A*(6,)V*(@,) # I, . Drop- 
ping the asterisks, we consider a new problem, in which the new @ is the old 
Nz, the new U is the old N, . Hence U is a subset of an m-dimensional subspace 
of R*. For some 6, ¢ 2, A(0,)V(@:) # Im. We may further simplify the problem 
by making a suitable transformation @ — 6*, continuously differentiable in both 
directions. This transforms V — V* = V(060/06*) and A — A* = (06*/00)A 
(the arguments have been suppressed). Thus, if AV ¥ J,, , then also A*V* + 7,, . 
We choose the transformation @ — 6* = ¢(@). Dropping the asterisks, in the 
new problem @ and U are identical, ¢ is the identity function on U, and 6 is a 
function on Z to U which is the identity function on U, by (2). For eachu e U, 
A(u) is a linear transformation of Z into the m-dimensional subspace in which 
U is embedded. If Theorem 1 is true, then, for each u, A(u) is the identity trans- 
formation on U. We have assumed that A(0) is not the identity transformation 
on U, and will show that this leads to a contradiction. 

If A(O), restricted to U, is not the identity transformation, then the same is 
true on at least one of the coordinate axes in U. We choose one of these coordi- 
nate axes, and call it the z-axis in the following. For simplicity, z, with or without 
subscripts, will stand both for a point on the z-axis and for its z-coordinate. In 
the following, |z| will denote the norm of a vector z, ||A|| the norm of a matrix 
A, I the identity transformation on U. Using the continuity of A, there is on 
the z-axis an interval N, = (—a, a) and there is an « > 0 such that for all 
t, 21, %2 € N, we have 


(7) \(A(E) — D(a2 — m1)| = Ge laze — m4) . 
Furthermore, we can choose a so small that for all 2; , x2 ¢ N, we have 
(8) | A(z) — A(x)|| < «. 


We now put 


(9) f(z, u) = (2) — u — A(u)(z — u), 





190 ROBERT A. WIJSMAN 


then from (4) it follows that 
(10) PiV/n |f(Zn , w)| = «} > 0, 


in which t%& is the true value of the parameter. Denote by S(u, n) the open 
sphere with radius n™” about u. If u = 0 we write simply S(n). Using (10) we 
have 


(11) P{Z, € S(uo, n), Vn |f(Zn , uo)| 2 «} > 0. 


Making the transformation y = n’?(z — w), Yn = n'"(Zn — wo), gn(y) = 
n'*f(z, uo), we can write (11) as 


(12) P{Y, e S(1), |ga(¥2)| 2 €} 0. 


Let u be the Lebesgue measure in R*. Since Y, has a limiting density with respect 
to wu, we conclude from (12) 


(13) uly € S(1), |ga(y)| 2 «} + 0. 

If we divide the left-hand side of (13) by the constant wS(1) = uf{y e S(1)}, 
and make the transformation back from y to z, we obtain 

(14) [uS(n)J'u{z € S(u, n), |f(2, u)| = en} +0 


in which we have dropped the subscript 0 on wu. 
For any z, let z = xz, + y,, where z, is the z-component of z. We choose a 
number a > 0 in such a way that for any 2, x2, n, with 


(15) in“? <mn—m <n” 
we have 
(16) u{z e S(x,,n)N S(ae,n), 1 < 2%, < 22} > apS(n). 


The number a can obviously be chosen independently of the particular choice 
of 2 , 22, n, so long as (15) holds. Restricting now u in (14) to points z in N,, 
we have for each x ¢ N, an integer n, such that 


(17) u{z € S(z, n), |f(z, z)| = en} < fapS(n) 


provided n 2 n,. According to Lemma 1, there are two points 2, t2 € Na, 
with 2, < 22, such that 2, ¢ S(xz, nz,) and x2 ¢ S(a, nz,). We can now choose 
an integer n = max (n,z,, nz,) such that (15) holds, and therefore also (16). The 
two equations (16) and (17) together imply that 
18) p(ze Siu, n)N S(z2, n), m1<&< a2, 
If(z, a)| < en”, |f(z, 22)| < en "”} > 0, 


so that the set in braces in (18) is not empty. Choosing any point z in this set, 
and using (15), we have the following inequalities: 


(19) 1 < 2%, <2, 





BAN ESTIMATES 


(20) |ysl < 2(a2 — 2%), 
(21) \f(z, t1)| < 2e(ve — a4), |f(z, %2)| < 2e(z2 — x). 


From (9) we compute 


(22) f(z, %2) — f(z, 1) = (A(m) — I(t — a) 


+ (A(z2) — I)(a2 — 24) + (A(ai) — A(ae))y: - 
By continuity of A and by virtue of (19) there is a point & & (2; , z2) such that 
(23) |(A(E) — DT — x)| = (AQ) — DI — 11) + (ACs) — Der — %)I. 
Using (8), (20), (21), (22) and (23), we have finally 
(24) \(A(E) — I)(ae — 2%)| < 6e(a2 — x), 
contradicting (7). Q.E.D. 


Acknowledgment. The author is indebted to Dr. Lucien M. LeCam for sug- 
gesting the proof of Lemma 1, and to the referee for helpful comments. 


REFERENCES 


[1] E. W. BaranxIn anp J. Guruanp, “On asymptotically normal, efficient estimators 
I,” University of California Publications in Statistics, Vol. 1 (1950), pp. 89- 
130. 

(2) J. Berkson, “‘Application of the logistic function to bio-assay,”’ J. Am. Stat. Assn., 
Vol. 39 (1944), pp. 357-365. 

[3] C. L. Cutana, “On regular best asymptotically normal estimates,’’ Ann. Math. Stat., 
Vol. 27 (1956), pp. 336-351. 

[4] C. L. Crane, ‘An application of stochastic processes to experimental studies on 
flour beetles,’’ Biometrics, Vol. 13 (1957), pp. 79-97. 

[5] T. S. Fereuson, ‘‘A method of generating best asymptotically normal estimates with 
application to the estimation of bacterial densities,’ Dissertation, University 
of California, Berkeley, 1956, Ann, Math. Stat., Vol. 29 (1958), pp. 1046-1062. 

[6] L. M. LeCam, “On the asymptotic theory of estimation and testing hypotheses,’’ 
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and 
Probability, University of California Press, 1956, pp. 129-156. 

(7] S. Lerscuetz, Introduction to Topology, Princeton University Press, 1949. 

[8] J. Neyman, “‘Contribution to the theory of the x? test,’’ Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, 1949, pp. 239-273. 





ESTIMATION OF THE MEDIANS FOR DEPENDENT 
VARIABLES 


By Outve JEAN Dunn! 
Statistical Laboratory, Iowa State College 


1. Summary. Joint intervals of bounded confidence are suggested for the 
medians of a bivariate population with continuous marginal distributions. 
The two intervals are of the classic type based on sample order statistics. 


2. Introduction. The problem considered in this paper is that of using a non- 
parametric method to estimate by a confidence set the unknown medians of 
two dependent variables. In various types of research, it is convenient to con- 
sider a sample of n individuals and to take measurements on the same n in- 
dividuals at two different times or at two different levels of treatment. The two 
measurements on the same individual cannot be assumed to be independent, 
so that it is appropriate to consider the 2n measurements as a sample of size 
n from a bivariate distribution. 

Let the two variables y; , yz with medians », , v2 have the c.d.f. (cumulative 
distribution function) F(y, y2). By a set of simultaneous confidence intervals 
of bounded confidence level 1 — a@ for » , v2. is meant a set of four functions of 
the sample values, say gi, g2, ii, he, such that 


P(g: < mn < hi, ge < m< he) = 1 — a. 


The probability relationship must hold for all underlying distributions in a 
specified set of distributions. In this paper the specified set will consist of all 
bivariate distributions whose marginals have continuous c.d.f.’s. 

The method used in this paper to obtain confidence intervals uses order 
statistics and requires only the assumption that the marginal distributions be 
continuous. 


3. Confidence intervals for the medians of a bivariate distribution. Let the 
c.d.f. of the variables y; , y2 be F(y: , yz) and let the two marginal distributions 
be Fi(y:) and F2(y2), both of which are continuous. 

A random sample of n observations will be denoted by (yu, ya), 
+++, (Yin, Yon). For i = 1 and 2, the set yu, --- , yin will be reordered from 
smallest to largest and renamed z,--- , zin. Thus 2a S zi S +--+ S 2in for 
i = 1 or 2. The 2; and 22; need not belong to the same observation. 

Two positive integers, r and s, such that 2r + s = n, are selected. Let E; be 
the event that zi, < vi < 2:,r404: fort = 1, 2. Then, for2z = 1, 2, 


r+e 


(1) P(E,)) = > (") (4)” = (1— a)’, say. 


j=r 


Received February 25, 1957; revised June 20, 1958. 
' Now at the University of California at Los Angeles. 


192 





DEPENDENT VARIABLES 193 


If the variables were independent, probabilities could be multiplied, so that 
P(E\E:) = 1 — a. This would give a set of intervals of exact confidence level 
for », and vw, namely 2, to 21,7404: for 1, Zer tO Ze,r4041 for v2. 

For dependent variables, the same set of intervals may | + used as a set with 
bounded confidence level for it can be shown that P(EZ,E,) = 1 — a. It should 


be noted that symmetric order statistics have been used, and indeed the result 
would not hold otherwise. 


The following proof establishes the necessary inequality. 
TuHeoreM. P(E,E:.) = P(E;)P(R:). 


Proor. Since P(Z,E,) = P(E, | £,)P(E£;,), it will be sufficient to prove that 
P(E; | Ey) 2 P(E:). 

If for a certain observation, y:1; > », then let the conditional probability 
that y2; > v2 (this will be referred to as the probability of a “‘success’’) be de- 
noted by p. Then using the fact that F;(1.) = Fo(v2) = 4: 


(2) p = Ply2; > v2| 4%; > 1) = 2F(rn, v2). 


Similarly, if it is known that y:; < »,, then let q be the conditional prob- 
ability that y2; > v2 (probability of a “‘success”). Then 


(3) q = Ply; > v2| ys < m1) = 1 — 2F(m, m2). Thus, p + q = 1. 


If it is known that Z, has occurred, then r + 7 observations have y:; < , 
and so have a conditional probability of success of g, where i may be 0, 1, --- s; 
r + s — i observations have y:; > » , and so have a conditional probability of 
success of p. 

To obtain a generating function for the probabilities of various numbers of 
successes, conditioned by the fact that yr < 1 < Yi,rie41, One May proceed 
as follows. Let £,(t) be the event that Z, occurs with r + 7 observations having 
Yj < ». Then E, = Up E,(i). 

Let Y< be the number of successes in the r + 7 observations which have 
Yiz < 1 ; let Y, be the number of successes in the r + s — 7 observations which 
have wi; > v2; let Z = Ye + Y, be the total number of successes. Then 


E (« | U E,@) > E(t’ | E,(i)) - P(E.) | Ey) 
(4) Pr 
2, E(t? | E.@))-P(E@))/P(E). 


Since under the condition F£,(i), Y- and Y, are independent, 


(6) &£ («1 U Ex) = © Bs | Ex) EC? | BO) PEW)/PCBD. 


The generating function G for the conditional probabilities of various num- 








194 OLIVE JEAN DUNN 
bers of successes, given that 2, < »; < 21,r404:, is, finally, 


a 1 r+i rte—i 
(6) G@=CP opp et (q+ pi” 


where 


-1/% (r+ sok $— 1)!" 
If a;,j = 0,1, --- , n, is defined as the coefficient of t’ in G, then 


(7) G= = a; é, 
j=0 
and 
r+s 
(8) P(E:| Ei) = Do a;. 


j=r 


The task now is to show that P(E, | E,) has a minimum at p = }. To do 
this, the derivative of G is obtained indirectly by differentiating equation (6) 
with respect to p, and then manipulating the derivative, G, , as follows: 


er > Set Veter (r+ )p+ "G+ wr 


— (r+s— i)(p+ gt) (q+ pt) 


c(i — 4) r—1 r—1 s+1 s+ 
“Sons a! (p + qt)” (q+ pt)” [q+ pt)” — (p+ gt)” ). 


Let C* = (C/(r — 1)\(r + 8)!). Then 
= C*(1 — )(p + at) “(a + vt) — vw) + pag — pt 
+ ("pg (qh — pe) + es + pi — Mt 
+ (i) pq(p" be qt’ + (p'** ps fy 

C*(1 — t)(q — p)(p + gt)” “(q + pt)” [b(t — ¢°*”) 

+ bil — Oh) + BA — fh *) + -- ] 

c*(1 — t)°(q — p)(p + at)” “(qa + pt)” [bo + (bo + bid 

+ (bo + by + ba) + +++ + (bo + by + be)th™ + (bo + by)th* + bet’). 


Here b; = (5*)p’q'(q" + qo? "p + --- + p””); hence the partial sums 
of the b,’s appearing as coefficients within the preceding square brackets are 
positive and (excluding the trivial cases p = 0 or 1) they increase toward the 
center coefficient(s). 


Multiplication of the polynomial within the square brackets by 
(p + at) “(q + pt) 


(10) 





DEPENDENT VARIABLES 195 


can be accomplished by successive multiplications by (p + gt)(q + pt), and it 
is easily verified that each multiplication yields a polynomial whose coefficients 
increase toward the center coefficient(s). Thus 


(11) G, = C*(q — p)(l — 2 + #) 
(co + ext + col” + -++ + cot” * + ot” + cot”) 
= C*(q — p)(do + dit + dof + --- + dot” + dt” + dot”). 


Here dj; = c; — 2cj1 + cj-2, forj = 0,1, --+ , and it is understood that c_,; = 
C2 = 0. 

The derivative with respect to p of the conditional probability P(E, | Z;) is 
then > 523 C*(q — p) dy = 2C*(q — p)(cr-2 — ¢r-1). Since cp» — cy; is always 
negative, the derivative is positive, zero, or negative according to whether 
p > 4, p = 4 or p < 4. Thus P(E£,| Z,) has a minimum at p = }, and 
P(E,E:) 2 P(E;)P(E:). 

Since E,E, C E,, one may further write P(E,) 2 P(E,E.) = P(E,)P(E:). 
The confidence level for the intervals 21, to 21,7404: , Zar tO Zar tO Zer4041 CaN 
actually be as high as (1 — a)” (for a distribution function such that p = 0 
or p = 1) or as low as (1 — a) (when p = $). 


4. Evaluation. One way to compare sets of confidence intervals is on the 
basis of their lengths, or the expected values of their lengths. I shall exhibit 
some length comparisons when y;, y2 are jointly normally distributed with 
means (or medians) », v2, variances o}, 02, and arbitrary co-variance. In 
Table I, the intervals for the medians obtained by the method of this paper 
(Method I) are compared with three sets of intervals for the means of a bi- 
variate normal distribution (Methods II and III and IV) obtained in another 
paper [1]. It should be mentioned that all four methods lead to intervals of 
bounded confidence. 

The figures given in the body of the table are values of (n””/o,)E(41;), where 
l; is the length of the confidence interval for »;, 7 = 1 or 2. Values of 1 — a 
(which for Method I cannot be chosen arbitrarily) have been selected as close 
as possible to .95. 

Method II (section 4.2 in [1]), which uses Hotelling’s 7 distribution, is similar 
to Method I in that no assumptions need be made concerning the variances, 
When n is small, the intervals are seen to be slightly longer on the average 
for II than for I. 

Method III, based on the Student ¢ distribution, requires that the variances 
be equal, though they may be unknown. These intervals are seen to be some- 
what shorter than those from Method I. Method III is found in section 7.2 of 
{1}. 

In Method IV, the variances are assumed to be known and the intervals ob- 
tained (the method of section 7.1 in [1]) are the shortest possible intervals for 
means of the bivariate normal distribution when nothing is known about the 





196 OLIVE JEAN DUNN 


TABLE I 


Comparison of Expected Values of Lengths of Confidence Intervals 
| for the Means of a Bivariate Normal Distribution 


vn E (4l,) for: 
oi 





_ | Method m1: | : 

© | tna | a-o% | oRletedtic, | Hotaling | Yaranees | Nartances 
| but Equal | _ 
6 | 939 | .969 | eatoz. | 3.10 | 3.72 2.83 | 2.16 
8 | .984 | .992 | 21 to ze 4.03 | 4.40 3.54 2.65 
10 | 958 | 979 | eatoz, | 3.17 3.21 2.72 2.31 
20 | 976 | 988 | 20 to zi 3.33 3.12 2.77 | 2.51 
100 | 958 979 | emtozm | 2.98 2.57 2.33 | 2.31 


| 








l; = length of the confidence interval for wu: (or »), i = 1, 2. 
Method I: E(4l;) computed using the expected value of order statistics in [2]. 
Method II: The intervals are 9; + (6;/+/n)ce , i = 1, 2, where ca is the 1 — @ point 
in the distribution of Hotelling’s T, and éf = Djs (yi; — 9s)*/(n — 1), i = 1, 2. 
Method III: The intervals are 9; + (6:/+/n)ce , i = 1, 2, where ca is the 


(1 + (1 — a)¥4]/2 
point of the Student ¢ distribution with n — 1 degrees of freedom, and 
oH = Lr (vay — H1)*/(m — 1) 
Method IV: The intervals are 7; + (6;/+/n)-ca , i = 1, 2, where cz is the 
{1 + (1 — @)"9]/2 


point of the standard normal distribution. 


covariance. They thus make a useful standard for purposes of comparison. 
These “‘best” intervals are considerably shorter than those from Method I, but 
this must be balanced against the fact that for the latter method no assump- 
tions are necessary concerning the variance or concerning distributional form 
(except for continuity). 

The fact that Method I, assuming nothing about the form of the distribu- 
tion, gives shorter intervals for small n than Method II, which demands a nor- 
mal distribution may seem somewhat surprising. The explanation lies in the 
fact that, for a given value of a, the actual probability of coverage is higher 
for Method II than for Method I. For p = 0 and p = 1, the actual probabilities 
of coverage for all four methods are as follows: 


Methods I, III, IV Method II 
" p=0 p=1 e= p=1 
6 .939 .969 -989 .994 
8 .984 .992 .997 .999 
10 .958 .979 .991 .995 
20 .976 .988 .994 .997 


100 958 .979 -988 - 994 





DEPENDENT VARIABLES 197 


Throughout the preparation of this paper, it was conjectured that the same 
set of intervals developed here might be used for a k-variate distribution. Mr. 
Ernest V. Scheuer has, however, recently drawn the author’s attention to the 
following counterexample, which shows that it is possible for P(Z,E:E;) to be 
less than P*(E,), where E; is the event that zi <j; < Zirie41,% = 1, 2, 3. 

Let r = 1, and, for simplicity, let » = », = »3 = 0. 

Let pin = P(y >< Y2 > 0, ys > 0), Puo = Ply ~ 0, ye > 0, ys &, 0), and 
similarly for pin , Pou , Pr00 , Poro , Poor , ANA Pooo . 

It can be readily verified that P(2u < 0 < Zin , 21 <0 < Zon, 221 < 0 < Zn) 
is smaller for pin = Pro = Poo = Pon = 1/4, Puo = Pin = Pou = Poo = O than 
it is under independence (pin = Puo = --* = Pooo = 1/8). 


REFERENCES 


[1] Ourve Jean Dunn, “Estimation of the means of dependent variables,’’ Ann. Math. 
Stat., Vol. 29 (December 1958), pp. 1095-1111. 

{2] D. Tercnroew, ‘‘Tables of expected values of order statistics and products of order 
statistics for samples of size twenty or less from the normal distribution,”’ Ann. 
Math. Stat. Vol. 27 (June 1956), pp. 410-426. 








A CONSISTENT ESTIMATOR OF A COMPONENT OF A 
CONVOLUTION 


By Wiiuiam R. Garrrey 
University of California 


1. Introduction and summary. Suppose the observed random variable X is 
the sum of two independent random variables Z and Y, where Z has a normal 
distribution with zero expectation and a known variance, and Y has a dis- 
tribution function, say G(y), which is completely unknown. Then the distribu- 
tion function of X may be written as 


0 ee 2 
(1.1) Fla) = = [ ow) exp | - oe] te 


where F(x) and G(y) are unknown. 

We consider here the problem of estimating G(y) from a sample 2 , 22, --- 
z, . Such a problem may arise if, for example, each x; represents a serum choles- 
terol determination on one human being randomly selected from some popula- 
tion. Then each x; may be thought of as the true cholesterol value for that per- 
son, plus an “instrumental error” introduced by the complex chemical analysis. 
We may wish to “‘correct” for the instrumental error, so to speak, by estimating 
the distribution of true cholesterol levels in the population. 

The maximum likelihood and minimum distance principles do not seem to 
yield estimators which may be expressed as explicit, more or less easily comput- 
able functions of the sample values. We present such an estimator, which is con- 
sistent at every continuity point of G(y). (We consider only continuity points 
throughout the paper.) The estimator is constructed by first exhibiting an in- 
version formula for G(y) in terms of the derivatives of F(x), and then replacing 
the derivatives by the difference quotients of the empiric distribution function 
F,(z). 

The asymptotic mean square error of the estimator is derived, and a rough 
rule is suggested for deciding when it is worthwhile to compute an estimate. 
The fact that the estimator is still consistent under certain kinds of dependence 
between Z and Y is indicated. Finally, some comments are made on the relation- 
ship between the present estimator and one derived by Eddington. 





2. The inversion formula for G(x). Denote by F(x) the 2kth derivative of 
F(z) evaluated at x. Pollard [3] has derived a formula which states, in effect, that 


(2.1) lim = (—1)*F™ (2) (0°t,/2)*/k! = G2), 


where {t,} is any increasing sequence of positive numbers with lim é, = 1. 
We require a modification of this inversion formula. Let {t,} be an increasing 


Received November 26, 1957; revised September 20, 1958. 
198 





ESTIMATOR OF A CONVOLUTION COMPONENT 199 


sequence of positive numbers such that lim ¢ = 1 and lim nt? = 0 for any 
k > 0. Consider 


(2.2) G,(z) = 2 (—1)*F™ (x) (07t,/2)"/ki. 
Then lim G,(z) = G(x). To prove this it is sufficient to show that 


(2.3) lim >> (—1)*F™(2)(o%,/2)*/k! = 0. 


k=on+1 
Now 


(2k) 7) 1 _ z—-y _G- | 
@4) F(a) = oe | Hs (Se ) | = | ow ay, 
where H(z) is the 2kth Hermite polynomial satisfying 
(2.5) Hy(x) S A2*+/(2k)! exp [x*/2], 


A being independent of z and k. ([5], p. 236) 
Therefore, 


(2.6) F™ (2) < Av/(Qk))/o™, 


and consequently the absolute value of the sum in (2.3) is at most equal to 
(2.7) A 2F V (2k)! (tn/2)*/k:. 


Since the coefficients of t, are bounded above, the absolute value of (2.7) is no 
greater than 


(2.8) Ata — ¢)7, 


which approaches zero faster than any negative power of n. 

If G(x), the rth derivative of G(x), is integrable, and continuous at a par- 
ticular value of z, a similar argument shows that lim G{?(z) = G“(z). 

For later use, we note that by virtue of (2.6) and the continuity of any deriva- 
tive of F(z), 
(2.9) F(a) = F™(b) + O(a — b)V/(@K)1/o™, 
and that the upper bound of O(a — b) is independent of k. 


3. The estimator of G(x) and its mean square error. Define the 2kth difference 
quotient of the empiric distribution function F(x) by 


2k 
(3.1) FO” (a, h) = (2h) > 7 (—1)'F,(z + (k — j)2h), 


—) 


for h > 0. The estimator of G(x), for a sample of size n, will then be 


(3.2) 0, (2) = : (—1)'F (x, h,) (o%t,/2)*/k! 


where lim h, = 0. 





200 WILLIAM R. GAFFEY 


Clearly, the asymptotic properties of the estimator will depend on the choice of 
sequences {h,} and {t,}. We derive below an asymptotic expression for the mean 
square error on the assumption that G(r) is integrable, and continuous at the 
particular value of x involved. Where there is no possibility of confusion, the sub- 
scripts are omitted from h, and f, . 

Consider first the expectation of the 2kth difference quotient. 


th Vor—1th i 
(33) @N™BIRS, WM] = fo PP yn) dye «>> dy. 


2kh-1—h 


We may write 


F™ (yo) a F™ (yo) + (Yor <. Yor—1) FF (yp_1) 
(3.4) 


+ (1/2)(yx — yours) *(é), 


where é is between yx and yx; . When the integral with respect to yx is taken, 
the second term on the right vanishes. Complete integration of the third term 
on the right results in an expression which may be written as 


(3.5) (2h)*F™* (e)h?/6, 


where zx — 2hk S § S x + 2hk. Therefore, 


zth vak—ath 
(2h)* BRS (2, w= 2h fe PO PCy) dye ++ dys 
z y 


—h 2k-2—* 


(3.6) 
+ (2h)*F°**” (¢)h?/6. 
Repeating the process, we find finally that 


2k 
(3.7) — (2h)* BFS (x, h)| = (2h)*F° (x) + (2h)* Do PO (EDN /6, 
ti 
where x — 2hk S &, S x + 2hk for all 7, or more simply, 


(3.8) E(FO” (x. h)| = F(x) + 2kF™* (n)h?/6, 


where x — 2hk S m S x + 2kh. Making this substitution in the expectation 
of G,.(z), we have 


2a 
(3.9) E(G,(z)] = G,(z) + . x (—1)*F**™ (mm) (0°t/2)*2k/ke!. 
Applying (2.9) we obtain, after some algebra, 


BIC,(2)] = Ga(z) — 222 (—1)tP** (2)(0%4/2)"/k! 
6 imo 
(3.10) 


+0 | > k/OR EDI t/2(k — pi}. 





ESTIMATOR OF A CONVOLUTION COMPONENT 

Now by virtue of the properties of the sequence {t,}, 
(3.11) > k~/(2k + 2)1t,/2*(k — 1)! = p> Ke | = Of(i — t,)~. 
Therefore, if lim h,(1 — t,)~* = 0, we have 
(3.12) [w10(2)1 — ane ~ BE fa cayp. 

Now consider the variance of @,(r). We note [4] that 
(3.13) BIF.(z)F(y)] = "—+* FGF) + = F (min 2, y). 
Writing out the expectation of G(x) with this substitution, we obtain 
(3.14) BIG.(2)] = "—* BG) + Bs, 
“ 


(3.15) '(Ga(2)) = Be — + B'G,(2)) 


where 


(316) 2a (Ge) i > ¥ (**) (? ") (<1) 


-F{min (x + (k — j)2h), (2 + (r — 8)2h)). 


For convenience in writing, let a, = o't,/8h2,. After some manipulation it can 
be shown that 


(3.17) B, a * BiG.(2) + C,, 


where 


om OE ELE Ove) 


-(—1)"1F(@ + (r — 8)2h,) — F(z + (k — j)2h,)). 
Using the fact that, for 0 S m < 2k, 


(3.19) > (—1)’ (?*) = (- 2 a ‘), 
j=0 J 
and, from (2.9), that 


F(x + (r — 8)2h,) — F(x + (k — j)2hn) 


3.20 
a = 2hal(r — 8) — (k — j)] + Ofha(r — 8)’ + hi(k — 35)’), 





202 WILLIAM R. GAFFEY 


we find that 





“. an” (2k + 2r — 2 
oe ret 22) 
: 2k +2r—2\ kr 
+o[% > (tan) |. 
If we let k + r = j, (3.21) may be rewritten as 
2h., tn (2a,)’ ‘ey * E 2n-1 (2a,)? (*’)] 
22 . ; - ——|: > 
(3.22) C.~ THT" a +0|- 2 mae 
It is easy to verify the following equalities as Taylor series: 
> aey l “ exp (4b cos’ z) — 1 9 


(3.21) 





mji\j-1 2 Jo cos’ x 
(3.23) Le . 
= | sin’ x exp (4b cos x) dx = bO(e™) 
and 
b’ ’ od 4 
(3.24) Ss ji ian (4b cos’ x) dx = O(e”). 
j=0 


Therefore, as b increases, (3.24) becomes negligible compared with (3.23). 
By the use of the integral form of the remainder, 


2n j . «/2 
b’ 2) —2 oo l —2 
(3.25) Ui 3 1 ) Fi 2x oun 2 
(= cos x) "dy — 1] “a 
: 4b cos* z 


and 


(3.26) > b (¥) pet. ests - exp (4b cos’ x) lf. "is iv dz 
: fo GL \I ~ (Qn — Il P 4b cos? z 


Let b = 2a, . Now if a,/n — 0, it is known ({7], Chap. 7) that, for any 
y 20, 


. 1 P 2n —v 
3.27) lim + [ ve dv= 1. 
( (2n)! S@ny 
Therefore, under this assumption, the ratios of (3.25) to (3.23), and of (3.26) 


to (3.24), approach unity as n increases, so that (3.26) may be neglected. As a 
result we have 





2h, <> (2a,)' Y “— , 
(3.28) C. i Pit 


and 





(3.29) o(Cx(2)) ~ BGa@)L = BOn@))) , he 5% an)’ (3 - *. 


n n ja jl j-1 





ESTIMATOR OF A CONVOLUTION COMPONENT 203 


Putting together (3.29) and (3.12), and taking account of the rapidity of the 
convergence of G,(x) to G(x), we have for the asymptotic mean square error, 


Bl(Gn(z) — G(a))*| ~ SOA — G@) 


n 


Ratan o pw. , Zhe < (SE) 1 ( - *) 

+ 5 G°@r + 2 (ae aNj 1)" 

In order for this asymptotic expression to be valid and for the estimator to be 

consistent, C, must approach zero, and h, and t, must obey the restrictions 

imposed during the derivation. These conditions on h, and t, are summarized 
here: 


(A) lim h,(1 — t,)~* = 0, 

and 
: <. o'tn 

(B) lim = exp Ei = 0. 
Condition (B) is sufficient to ensure that C, and a,/n approach zero. 

4. Specific sequences {h,} and {¢,}. The logical step, after deriving the mean 
square error, is to determine sequences {h,} and {t,} which minimize it. In the 
present case the complexity of (3.30) makes this extremely difficult. Alterna- 


tively, we may search for easily computed sequences satisfying conditions (A) 
and (B). Suppose we let 


(4.1) ha = a(In n)™, a>0dO 
and 
(4.2) tp = 1 — (Inn), B> 0. 


Then it may be verified that if 8 < a/4, condition (A) is satisfied. If, in addi- 
tion, a = 1/2 and a 2 a, then condition (B) is satisfied, and in fact C, becomes 
negligible. 

In order to minimize the bias, whose square is the second term of (3.30), it is 
reasonable to let a assume its minimum value. Finally, the convergence of G, 
to G depends on the fact that 8 > 0. A value convenient for computation, say 
8 = 0.1, is suggested. These “convenient” sequences are then 


(4.3) h, = o(In n)~* 
and 


(4.4) th = 1 — (inn). 


(3.30) 


5. Remarks on the bias of G,(z). It is possible that, with smaller sample sizes, 
the bias introduced by the estimating procedure may be greater than the bias 
involved in ignoring the whole problem and simply using the original sample 
distribution function to estimate G(x). A reasonable rule of thumb for deciding 
if it is worthwhile to compute G,(z) is to do so if the maximum bias of G,(zx) 
does not exceed the maximum bias of F,(z) for the given sample size. To get 





204 WILLIAM R. GAFFEY 
some idea of the order of magnitude of the sample sizes required, we will assume 


that G(x) is a normal distribution function with variance 7’. 
The maximum bias which can result from using F,,(z) is 


(5.1) max | BlF,(z)] — G(z)| = max [®(2/r) — O(2/Vo* + P)], 


where (zx) is the standard normal distribution function. The maximum is at- 


tained when 
4 
I= £| (+ in (1 oe “| . 
o r 


The maximum asymptotic bias of G,.(z) is 








2 2 2 2 
(5.2) max meh |G(a) | = oh max | (2) |, 
where 
(4) cl 3x nae *) —z2/2r2 

(5.3) G (x) Van (3 a é ° 

|G(z)| attains its maximum at z = .742 +, and 

1.38 

(5.4) max |G(z)| = ine 


Therefore, the maximum asymptotic bias of @,(z) is no greater than the maxi- 
mum bias of F,(x) if (using the forms (4.3) and (4.4) for h, and ¢,) 


(In n)“[1 — (In n)~°"] < 4.35+>/2e (:) 


(5.5) 
. [ea + 7°/o*) In (1 + o°/r’)}) —@ (: {In (1 + “/)))]. 
Since this inequality involves the asymptotic bias of @,(zx), it is presumbly not 
too trustworthy for small n. Substituting n = 30 and solving for o/r, we find that 
(5.5) holds if ¢/r < 3. In other words, a sample of size 30 justifies computing 
G,(z) if the standard deviation of the known component is no more than three 
times the standard deviation of the unknown component. Therefore, even 
though (5.5) is an asymptotic expression, it seems reasonable to say that with 
samples of size 30 or larger, it is worthwhile to compute G,(x) for most situations 
of practical interest. 


6. Dependence between Z and Y. Suppose now that the normal random 
variable Z, instead of having expectation zero, is dependent on Y in the sense 
that it has an expectation wu, when Y = y. Then if y + wy, isa continuous, strictly 
monotone function of y, the analyses leading to (2.1) and (3.2) are still valid, 
provided the derivatives and difference quotients are now evaluated at the point 
x + u,. Therefore, under this kind of dependence, G.(x + us) estimates G(z). 





ESTIMATOR OF A CONVOLUTION COMPONENT 205 


7. Other estimators of G(x). Although the present paper arose from considera- 
tion of some public health problems, the problem of instrumental error has had a 
long history in astronomy. In particular, several solutions to the integra! equa- 
tion (1.1), under varying restrictions on G(x), have been given by astronomers. 
(See [2] and [6] for bibliographies.) With the exception of [2], however, the solu- 
tions themselves are offered as estimators, without taking into account the error 
involved in using an estimate of F(x). Consequently, their consistency is open 
to question. In [2], an estimator for G(x) when F(z) is observed is given, 
and its maximum bias computed when G™(z) is a normal probability density. 
The approximate variance of the estimator when F(z) is subject to error is 
also given, but the form of the bias shows that the estimator is not consistent. 

It is of some interest to examine one of the first solutions to (1.1), given by 
Eddington [1]. It is 


(7.1) > (—1)'F™ @) (6"/2)*/k = G), 


which may be thought of as Pollard’s formula (2.1) with the limit and summation 
operations interchanged. In practice, only the first two terms of (7.1) are used 
as an estimator, and difference quotients are apparently used to approximate 
the derivatives. However, even if we consider the whole series, and assume the 
derivatives known, it is clear that the convergence of the solution depends on 
the form of G(x), so that (7.1) is not consistent for arbitrary G(z). 


REFERENCES 


{1] A. S. Eppineron, ‘On a formula for correcting statistics for the effects of a known 
probable error of observation,’’ Mon. Not. R. Astrom. Soc., Vol. 73 (1913), pp. 
359-360. 

[2] F. D. Kann, “The correction of observational data for instrumental band width,” 
Proc. Camb. Philos. Soc., Vol. 51 (1955), pp. 519-525. 

[3] H. Potuarp, “Distribution functions containing a Gaussian factor,’’ Proc. Amer. Math. 
Soc., Vol. 4 (1953), pp. 578-582. 

[4] M. Rosensuatr, “Remarks on some nonparametric estimates of a density function,”’ 
Ann. Math. Stat., Vol. 27 (1956), pp. 832-837. 

[5] G. SzeGco, Orthogonal Polynomials, American Mathematical Society, New York, 1939. 

(6] R. J. Trumpuer anv H. F. Weaver, Statistical Astronomy, University of California 
Press, Berkeley, 1953. 

(7] D. V. Wipper, The Laplace Transform, Princeton University Press, Princeton, 1946. 





BAYES AND MINIMAX PROCEDURES IN SAMPLING FROM FINITE 
AND INFINITE POPULATIONS'—I 


By Om P. AGGARWAL 
Purdue University* and Stanford University 


Summary. Some of the sampling methods and the methods of estimation 
usually employed in sample surveys are considered in terms of loss and risk 
functions. The loss function is taken as the sum of two components, one pro- 
portional to the square of the error of the estimate and the other proportional 
to the cost of obtaining the sample. Consideration is given to the problem of 
the allocation of the total sample size and only non-sequential estimates are 
discussed. As the loss function is convex and of finite expectation in each case, 
only non-randomized estimates are considered, since Hodges and Lehmann [5] 
have shown that under these conditions the.class of non-randomized estimates is 
essentially complete. Only simple random sampling and stratified sampling 
methods are discussed in this part, the ratio, regression and sub-sampling methods 
will be discussed in subsequent parts. 


1. Introduction. In the current practice of conducting sample surveys, the 
statisticians have adopted one of the following two procedures (see, e.g., [2], 
[3], [4], [7], or [8]): (i) to get an estimate of maximum precision for a given total 
cost of the survey, or (ii) to get an estimate of given precision for a minimum 
total cost of the survey. The allocation of the resources for a given survey is 
usually carried out, keeping in mind one or the other of the above two aims. 
It is possible, however, to consider jointly the losses resulting from the errors 
in the estimates and from the cost of sampling, and to employ such sampling 
and estimation procedures as will, in some sense, ‘‘minimize’’ the total expected 
loss. Accordingly we shall take as loss function the sum of two components, 
one proportional to the square of the error of the estimate and the other pro- 
portional to the cost of obtaining the sample. The problems generally met in 
sampling surveys will be formulated in terms of the decision theory using this 
loss function and it will be seen that their solutions are the classical results in 
estimation and design. This appears to be a preliminary step toward further 
research in this field. 


2. Bayes and minimax estimates. The estimation problem with a fixed sample 
size has the following structure. We are given a sample space X, a space of 
probability distributions on X, Po = {p, :w € 2}, where @ is an index set (gen- 
erally called parameter space), and a numerical-valued function g defined on 


Received April 29, 1958. 

1 Work done under the sponsorship of the Office of Naval Research while the author was 
at Stanford University. 

?On leave from Purdue University during 1958-59 as Statistician with the F.A.O. of 
the United Nations on their Technical Assistance Mission to Chile. 


206 





BAYES AND MINIMAX PROCEDURES 207 


Q whose value g(w) the statistician wishes to estimate on the basis of the out- 
come of an experiment, say x e X. A non-randomized decision function for the 
statistician, usually called an estimate’, is a numerical function 6 defined on X, 
specifying for each x the number a ¢ A which will be chosen to estimate g(w) 
when that z is observed. The space of actions A is here the real line. The loss 
function L defined on 2 X A is non-negative and is the loss incurred when g(w) 
is estimated by a. The risk function R is defined by 


(2.1) R(w, 6) = E,L(w, 8). 


The subscript w appended to the symbol £ for expectation indicates that w is 
to be regarded as fixed when expectation is taken. 

If it is assumed that w is obtained by nature as the value of a random variable 
having a probability distribution A, a Bayes estimate with respect to the a 
priori distribution \ is defined as an estimate 5 which minimizes the average 
risk f R(w, 6) d\(w). If the statistician knew \, he would choose this estimate as 
his best action. But in the absence of any knowledge of \ the statistician may 
decide to use what is called in the terminology of two person zero-sum game a 
minimax strategy. A minimax estimate is defined as an estimate 6 which mini- 
mizes the “maximum’”’ risk, supueo R(w, 6). In the same terminology, a least 
favorable distribution or ‘“‘maximin” strategy is defined as a distribution \ 
such that it maximizes the “minimum” risk inf; f R(w, 5) dA(w). 

The following theorem [6] gives in many cases a minimax estimate as well as 
a least favorable distribution whenever the latter exists. 

THEOREM 2.1. Jf a Bayes estimate 5, has constant (independent of w) risk 
R(w, 6) = 1, then 5, is minimax and d is a least favorable distribution. 

The following theorem [6] will give in many cases a minimax estimate where 
no least favorable distribution exists. 

TueoreM 2.2. If {An} is a sequence of a priori probability distributions, {r,} 
the sequence of associated Bayes risks, and if r, ~ ras n— ©, and if there exists 
some estimate 6 for which R(w, 6) S r for all w, then 6 is a minimax estimate. 

We shall frequently need another theorem in the sequel. Using the term “mini- 
max risk”’ for inf; sup, R(w, 5) we state and prove it as 

THEoreM 2.3. If 5, r are a minimax procedure and the minimax risk respec- 
tively, assuming that the observations X follow any probability distribution w ¢ Q*, 
and if 2 D Q* is a space of distributions for which the risk associated with 6 does 
not exceed r, then 6 is a minimax procedure and r the minimaz risk for all distribu- 
tions of X in Q. 

Proor. By hypothesis 


(2.2) r = sup R(w, 5) Ss sup R(w, 6) S r. 
weQ* we 


3 It should be more properly called an estimator, to distinguish the function from a 
specific numerical value, but it is hoped no confusion will be caused by using the same term 
for both. 





208 OM P. AGGARWAL 


Hence equality holds. If d is any other procedure, 
(2.3) sup R@,d)=rs sup R(w, d) s sup R(w, d), 


which shows that 6 is a minimax procedure and r the minimax risk for all distri- 
butions in Q. 


3. Sampling from a finite population. In its simplest form, sampling from a 
finite population may be described in the following manner. We are given a sample 
space and nature (or a conscious being) performs a fixed sample size experiment 
and obtains a value of a random variable, which is a point x = (2, %2, «+: , tw) 
in the space R” of ordered N-tuples of real numbers or vectors with real com- 
ponents. It should be mentioned, however, that all random variables may not 
be available to nature. The statistician has to select one out of a class A of 
possible actions in complete or partial ignorance of x and the particular prob- 
ability distribution employed by nature to obtain x. He (the statistician) incurs 
a loss which is a bounded function of the selected action a ¢ A and the point 
x e R”, but not of the underlying conceptual distribution. However, for a given 
a, the loss is assumed to be constant for all permutations of the coordinates of 
z. The statistician can obtain partial information on z by observing some fixed 
number of coordinates of x, say n. The problem is: If the cost of observing 7; , 
t = 1, 2,---, N, is independent of i, how should the statistician select the 
sample and choose a? 

In the case that 2 = R” and the strategy for nature corresponding to w is to 
choose x = w with probability one, Blackwell and Girshick [1] have shown that 
the invariance and sufficiency principles require the sampling scheme to select 
each set of n distinct integers from 1 to N (without regard to order) with prob- 


ability oa . This is the usual strategy of simple random sampling without re- 


placement. The proof extends to the more complex situations discussed in this 
paper. Accordingly, we seek Bayes and minimax strategies using this sampling 
scheme. 


4. Statement of the problem. We shall be estimating the mean of a finite 
population with our loss as squared error. If the variance is not restricted some- 
how, our risk may be arbitrarily large. Accordingly, we bound, under any 
strategy of nature for choosing the finite population, the expected value of the 
variance of the finite population. Another possibility would be to divide the 
loss by the expected variance. Our method actually shows that the sample mean 
is minimax for each expected variance. 

Formally, the decision problem in which we are interested may be characterized 
as follows: 

(a) The sample space is (X, 2, p), where X is the N-dimensional Euclidean 
space R”, © is the set of all distributions w on hyperplanes in R” of the form 
Xi + %2 + +++ + ty = constant, say Nu, , and subject to the restriction that 


(4.1) By (es — w= f+ fe — Nut) dole) = WW - 10 





BAYES AND MINIMAX PROCEDURES 209 


where z is the column vector with 2; , 22, --* , Zw as elements, o is a given posi- 
tive number, and p, = w for w ¢Q. 

(b) The action space A is the real line R’. 

(c) The loss function L, defined on (Q X A), is given by 


(4.2) L(w, a) = (a — ww)”. 


(d) The space D of decision rules is the space of all ordered n-tuples of differ- 
ent integers from 1 through N together with all measurable mappings of n-space 
into A. If § = (i, --- , tn; f), then 6(z) = f(x;:,, +--+ , v4). 

The application of invariance and sufficiency principles, as in the problem 
stated in the last section, require the sampling scheme to select each set of 
ordered n-tuples of different integers from 1 through N with equal probability. 
Accordingly, it is sufficient to consider the following problem: 

(a) The sample space (X, Q, p), where X is the n-dimensional Euclidean 
space R”, Q is the same as before, and p, for w ¢ Q is the distribution of a sample 
x = (m1, --- ,2,) obtained by simple random sampling without replacement from 
21, %2,°** , Xn, which are distributed according to w. 

(b) The action space A is the same as before. 

(c) The loss function L is the same as before. 

(d) The space D of decision rules is the set of all measurable mappings of 
X into A. 

The problem of obtaining a minimax estimate of the mean yu, of the finite 
population (z;,---,2w) is solved in the following way. Consider nature’s 
strategy as picking yu. from N(0, 6), a normal distribution with mean zero and 
variance 6°, and given yu, , letting w, with probability one, be singular N-variate 
normal with mean yu, and variance o'(N — 1)/N for each component and co- 
variance —o’/N for each pair of components. A Bayes estimate is obtained 
with respect to this strategy of nature, regarded as a member of a sequence 
{Xe} of a priori distributions, and the limit, if any, of the corresponding sequence 
of Bayes risks {r»} as @ — © is obtained, say r. Then if we can find some esti- 
mate 6 for which the risk R(w, 6)—without assuming normality of w—does not 
exceed r, then by virtue of Theorems 2.2 and 2.3, 6 is a minimax estimate. 

The discussion so far is given in detail for the case when the sampling plan 
is simple random sampling without replacement. Under somewhat different 
situations different sampling plans will be required. We shall not discuss the 
derivation of the minimax strategies for the choice of sampling plans (it can 
be shown by an extension of Blackwell and Girshick’s proof of the optimality 
of simple random sampling [1] that the sampling plans given are optimum under 
the circumstances) but will take the sampling plans as being given and confine 
our attention to the problem of estimating the mean of the populations by em- 
ploying techniques similar to the one outlined above. 


5. Bayes and minimax procedures for estimating the mean of a finite popu- 
lation with simple random sampling (without replacement). The average risk 
corresponding to an a priori distribution ¢ for nature and a decision function 6 





210 OM P. AGGARWAL 


used for estimation of g(w) by the statistician is obviously 


(5.1) RE, 6) = B E [(6(x) — g(w))?| al, 


where x stands for (x, 22, ---,2,) and the symbol below £ indicates the 
space over which the expectation is to be taken. For the sake of simplicity we 
shall not attempt to distinguish between a random variable and its observed 
value. Since the integrand is non-negative, we may change the order of taking 
expectations and write 


(5.2) R(é,8) = EE [(6(z) — g(w))*| al. 


x Qa 


This is minimized by choosing, for each xz, that number 6(z) which minimizes 
E\(6(z) — g(w))* | z] and the number 4(x) which does it is clearly 


(5.3) 5:(x) = E(g(w) | x). 
This gives the minimum value of E[(8(z) — g(w))’|2] as o3)\2, the variance 
2 


of the conditional distribution of g(w) given x. Then, by (5.2), the Bayes risk 
re is given by 


(5.4) r= E Counia- 
x 


These results hold in general whenever the loss function for the estimation of 
g(w) is of the form L(w, a) = [a — g(w))’. 

In the problem under consideration, taking the a priori distribution for nature 
as mentioned in the last section, the distribution of the sample (zx, --- , xn) 
given w is n-variate normal with mean mu, and variance a (N — 1)/N for each 
x; and covariance —o’/N for each pair (2; , xj), i ¥ j. Since it can be shown easily 
that the sample mean Z is a sufficient statistic for u , we see from (5.3) and 
(5.4) respectively that the Bayes estimate 5(z) = E(u|xr) = E(u.|#), and 
the Bayes risk re = Eox.j2 = Eoy,\2. Now we is N(O, 6) and, given uw, Z is 
N (uw, v) where v = (n™ — N“)o’, so uw, and Z have a bivariate normal distri- 
bution. It is then easily seen that the conditional distribution of yu. given Z is 
normal with mean @°2(@° + v)~ and variance @'v(@ + v)~’. Since the variance is 
independent of x, these are respectively Bayes estimate 49(x) and Bayes risk r¢ . 

To find a minimax estimate for uw, , we consider if the sequence {rs} tends to a 
jimit as @ — o. It is seen that it does and the limit r is given by 


(5.5) r= lim 1% = 0 Vn 


By Theorem (2.2) if we can find some estimate 6 for which the risk does not 
exceed r, then that 6 is a minimax estimate. Trying 6(z) = @ (=lims.. 40(x)), 





BAYES AND MINIMAX PROCEDURES 


we see that the risk corresponding to 4 is given by 
R(w, 8) = E.(Z — wy)” = EsE((Z — we)’ | a1, +++ oy) 


N- 2 
Say Be — oe) 


N-—n2 


< Wn?” by (4.1) 


=f. 


Hence Z is a minimax estimate and the risk corresponding to this estimate 
(minimax risk) does not exceed (NV — n)o’/nN. 


6. Bayes and minimax procedures with stratified sampling plan. In survey 
designs stratification is a procedure whereby the entire population is divided 
into a number of strata and sampling is carried out independently in each 
stratum. Let c; be the known cost of sampling per unit in the ith stratum, yu; 
and o; the unknown mean and the known variance of the population in the ith 
stratum, and let k be the number of strata. Then, for a given cost C = > ins CN , 
where n; is the number of observations sampled from the ith stratum, it is well 
known (see e.g. [2], [3], [4], or [8]) that the procedure which estimates 
Din Nw; with minimum variance is to choose n; proportional to Njo:/+V/ci 
and to use >. '., NX; as an estimate, where N; is the size of the ith stratum 
and X; the sample mean of the n; observations selected at random from it. 
The same values for n; are obtained when for a given variance of the estimate, 
the object is to minimize the total cost of sampling. In this section we shall 
investigate for this problem some Bayes and minimax procedures, first for an 
infinite and then for finite populations. 

A. Infinite populations. Suppose that the ith stratum consists of an infinite 
population with unknown mean yu; and known upper bound 7 for the variance, 

1, 2, --- , k, and that we have to estimate a linear function of the yu; , say 
U = >°{.: am; where the a; are some given real numbers. Without loss of gen- 
erality we may take >> a; = 1. For the sake of simplicity we shall assume that 
none of the a; is zero. The loss function L is given by 


k 
(6.1) L(U, 8) = (6 — U)? + Den, 
t=) 


where n,(>0) is the size of the sample chosen from the ith stratum, c; the sam- 
pling cost per unit in that stratum, and 6 is a function of the sample 
{Xi;;7 = 1,2,---k;7 = 1, 2,--- , ni} , where X;; is the jth observation from 
the ith stratum. For the sake of simplicity of notation, as before, we shall not 
attempt to distinguish between a random variable and its observed value. 

It may be noted that a slightly more realistic loss function would be 


(6.2) L(U, 8) = a(6 — U)* + 2 Cin, 





212 OM P. AGGARWAL 


where a is some constant depending upon the desired relative accuracy of the 
results and the cost of experimentation in a given situation. But it is easily seen 
that any procedure corresponding to this loss function can be obtained from the 
corresponding procedure when the loss function is (6.1) simply by substituting 
c;/a for c;. The risk associated with it will be simply a times the corresponding 
risk when the loss function is (6.1). For this reason also the condition ) a; = 1 
in this section does not detract from the generality of the a; . 

Assume at first that, given w, the distribution in each stratum is normal, with 
variance oj in the ith stratum, i = 1, 2, --- , k. We may conjecture that for this 
problem there is no least favorable distribution of nature, since as U could have 
any real value, what we would expect it to be is a uniform distribution over the 
real line, but this is not a distribution.* We shall assume that the yu; are normally 
and independently distributed each with mean zero and variance 6’, and find 
Bayes solutions corresponding to the sequence {ds} of a priori distributions of 
U resulting from the distribution of »;. Let 4 denote a corresponding Bayes 
estimate of U. If the Bayes risks r, corresponding to 4, tend to a limiting value 
r when @ tends to infinity, then any estimate which has its risk less than or 
equal to r will be a minimax estimate by Theorem 2.2 under the normality 
assumption of the observations. This assumption may then be removed easily 
with the help of Theorem 2.3. 

We may regard the n;, as fixed for the purpose of finding the estimates. Letting 
6* be a minimax estimate for given n; , we shall choose the n; so as to minimize 
the risk, 


(6.3) R(U, 6*) = E(6* — U)’ + > Cini, 


as a function of the n; . 

Since we are working with fixed n;, we shall omit the >~ cm, term. The loss 
function is now simply the square of the difference between the estimate and 
the quantity U being estimated and, as in the last section, Bayes estimate 5, 
and Bayes risk rp are given respectively by the mean and the expectation of 
the variance of the conditional distribution of U, given the sample z. However, 
since the stratum sample means are jointly sufficient for wu, --- , uw. , we may 
replace the sample z in the last sentence by Xi, --- , X; , where X; is the sample 
mean from the ith stratum. 

Now, the yu; are independently and normally distributed with means zero and 
variance 6°, and given u;, the X; are independently and normally distributed 
with mean »; and variance o;/n;, 80 4, -** , we and X,, --- , X;, have a joint 
2k-variate normal distribution. It is then easily seen that the conditional distri- 
bution of u;, given X,, --- , X;, is normal with mean 


Ox, 
6.4 a eee 
(6-4) Wo 


‘I understand from an oral communication from H. Rubin that a proof of this con- 
jecture has been given by M. A. Girshick. 





BAYES AND MINIMAX PROCEDURES 


and variance 


ee’ 

ra wa tae 

and that uw, --~, “#s are mutually independent given X,, --- , X,. Thus, for 
given X,, --- , X,, the distribution of U isnormal with mean }* ayy; and vari- 
ance oo aiv; . We thus conclude that the Bayes estimate d)(z) = 59(X1, --- , Xs) 
= > int ay;, and since the variance of the conditional distribution of U is 
independent of Xi, --- , X,, the Bayes risk r» = >-f_, ain; . 

Minimax estimate for given n;. Letting 6 —> ©, we see that ry» — r, where 
r= > '., ajoi/n;. Thus, if we can find some estimate 8* with risk < r, then 
5* will be a minimax estimate by virtue of Theorem 2.2. Let us try the limiting 
Bayes estimate, 


- 
(6.6) lim 8)(z) = >> a; X; = 8*(z), say. 
6+ t=1 


Since the X; are normal and independent with means yu; and variances o:/n; , 
> a:X; is normal with mean >> aj; = U and variance >— ajoi/n; . Hence the 
risk corresponding to the estimate 4*(z) = > a;X; is equal to r which proves 
that >> a;X; is a minimax estimate of U for given n; . 

It may be of interest to point out that although the Bayes estimates 
5s = >. ay; , where y; is given by (6.4), being unique (for given @) are admissible, 
we cannot conclude from this the admissibility of 5* because of the limiting 
process. The same remark applies to the other Bayes and minimax estimates 
obtained later, but we shall not go into the question of admissibility in this 
paper. 

Removal of the normality assumption. Let us now do away with the assumption 
of normality of the distribution of X;; . Suppose that whatever the joint distri- 
butions of the X,; , the distribution in the ith stratum has an unknown mean 
u;, and the sample mean X,, for any sample size n;, has a variance not ex- 
ceeding a known positive number oi/n; for i = 1, 2,---,k, and that the X; 
are uncorrelated. This is somewhat more general than the usual assumption of 
X;; being independent with mean yu; (unknown) and variance oi (known) in 
the ith stratum, in which case the stratified sampling procedure is generally used. 
Let us calculate the risk R corresponding to the minimax estimate 6*(z) = >> a;X, 
obtained under the assumption of the normal distribution of the observations 
in each stratum. It is easily verified that under these general assumptions, for 
given n,, 


t=l 


R= B(d ak. — v) 


= ely a(X; — wd | S ajoi/n; = ©. 


i=] 





214 OM P. AGGARWAL 


Applying Theorem 2.3 now, we conclude that the minimax estimate )> a;X; 
obtained under the assumption of normality of observations is still a minimax 
estimate for given n; under the general assumptions given in the beginning of 
this paragraph. 

Minimaz strategy for choosing the n;. Restoring the term > cm; in the risk 
function, one can choose “optimum” n, if the variances of the populations in 
different strata are known, rather than only the upper bounds, by minimizing 
the risk as a function of the n;. However, if only the upper bounds and not 
the actual variances are assumed to be known, the optimum choice of the n; 
against the largest allowed variances oj may not be optimum for other values 
of the variances, but it will still be a “minimax” choice. In other words, a mini- 
max strategy for the statistician is to choose the n; to be “optimum” against 
the maximum allowed variances 7, and then estimate U using the 6* for these 
n; . This statement follows from the following theorem. 

THEOREM 6.1. Suppose the space of strategies for the statistician is a union 
of spaces, say D = U.D.. If 5. is minimax in D, against Q, the space of nature’s 
strategies, and if R(w, 5.) is constant for each c, say R(w, 6.) = r. , then the 6, mini- 
mizing r. , if it exists, is minimax in D. 

Proor. Let 5 ¢ D be any strategy for the statistician. Then 6 ¢ D. for some c. 
Since 6. is minimax in D, , 


(6.8) max R(w, 5) 2 max R(w, 6.) = re. 

Let 5.« be the 6. minimizing r. and denote the risk corresponding to 6.« by re . 
Then 

(6.9) re 


IV 


ree = max R(w, 5). 


From (6.8) and (6.9), max, R(w, 6.) S max, R(w, 6) for all 6 e D, hence 6.« is 
minimax in D. 

We, therefore, choose optimum n; corresponding to the variances in the differ- 
ent strata as the o; . For given n,, the risk corresponding to 8* is given by 


k - 

(6.10) R(w, 6*) = 2d [*! + cons]. 

Now we want to choose the n; so that this risk is minimum under the restriction 
that the n; are positive integers. Since the ith term on the right hand side of 
(6.10) depends on 7; alone, it is sufficient to minimize ajei/n; + cm; subject to 
the restriction that n; is a positive integer. Denoting this expression by f(nj), 
we see that 

2 


2 
, | ~ den ye ingens 
(6.11) f(n; + 1) — fn) = aie 1D: 


To minimize f(n;), we choose the smallest positive integral value for n; for which 
the difference (6.11) is positive; in other words, the smallest positive integer n; 





BAYES AND MINIMAX PROCEDURES 


for which (n; + 1/2)’ exceeds aj oi/c; + 1/4. This gives 


2 2 
(6.12) n; = integer nearest to / St Hh ig: i 
Ci 


When V aj oi/c; + 1/4 lies exactly between two integers, say m and m + 1, 
the risk is equal and minimum for both n; = m and n; = m + 1, and it is im- 
material which of the two nearest integers is chosen for n; . 

B. Finite population. Suppose that 2;;, (¢ = 1, 2,---,k;7 = 1,2,---,Ns) 
denotes some numerical characteristic of the jth unit in the ith stratum. Sup- 
pose further that the N;(>1) are known, the means 1; of the strata are unknown, 
the upper bounds of the variances of the populations in the strata are known, 
say oi = 1/(N; -— LIED, (x4; — u,)*, and that we are required to estimate a 
linear function, 


* 
(6.13) T => a;u;, 
i=l 
of the population means u;, where a; are arbitrary known real numbers with 
> a; = 1, the loss function L being given by 


(6.14) L(T,8) = (6 — T)? + 2 Cn; , 


where 6 is an estimate for T, and c; , n; denote the cost of sampling per unit and 
the number of units sampled in the ith stratum. The sampling plan given is to 
decide upon k positive integers n; ,7 = 1, 2, --- , k, and then choose a sample of 
size n; by simple random sampling without replacement from the ith stratum, 
thus obtaining a sample of total sizen = > ‘..n;. We shall first assume that 
the n; are determined somehow and obtain Bayes and minimax procedures and 
the corresponding risks for given n;. Later we shall see how to choose the n, 
so that the risk obtained is minimized over the choice of n; . As before, by Theo- 
rem 6.1, this choice of the “optimum” n; will be a minimax strategy for the 
statistician. 

As in the case of simple random sampling discussed in Section 4, which is a 
special case of this problem for k = 1, we are considering now a decision problem 
in which the distribution w ¢ 2 consists of the product of k independent distri- 
butions w; on hyperplanes in R™‘ of the form x + --- + zw, = constant, 
say Nino, , and subject to the restrictions that 


Ni 
(6.15) E., >, (vii — ud? S (Ni — Voi, 
j=l 


where the constant is denoted by N u., to make y,, themeanof ra , 22, -** , Lin, 
and y»., itself is being written as u; for the sake of convenience of notation, the 
o; are given positive numbers, and p, for w ¢ Q is the distribution of k independ- 
ent samples xz; = (ra, -°-: , Zin,), the ith sample being obtained by simple 





216 OM P. AGGARWAL 


random sampling without replacement from za, --: , tiv, , distributed ac- 
cording to «; . 

The problem of obtaining a minimax estimate of T for given n; is solved as 
before. Consider nature’s strategy as picking each u;(=y.,) from N(0, 6°) and 
given u;, letting the distributions w;, with probability one, be singular N;- 
variate normal with mean u; and variance o{(N; — 1)/N; for each component 
and covariance —o,/N; for each pair of components. A Bayes estimate of U is 
obtained with respect to this strategy of nature, which is regarded as a member 
of a sequence {i } of a priori distributions, and the limit, if any, of the corre- 
sponding sequence of Bayes risks {re} as @ > © is obtained, say r. Then an esti- 
mate 5 for which the risk R(w, 6)—without assuming normality of w—does not 
exceed r is, by Theorems 2.2 and 2.3, a minimax estimate for given n; . 

With nature’s strategy as explained in the last paragraph, the distribution of 
the sample z = {2;;;j = 1,---,nm:,¢ = 1,---, k} given w is the product of 
k distributions, the ith being n;-variate normal with mean u; and variance 
oi(N; — 1)/N; for each component and covariance —ci/N; for each pair of 
components. Again, since the set (Z,, --- , %) of the sample means from the 
k strata is a sufficient statistic for the set (uw; , -- + , we), and hence for 7, we may 
replace the sample z in (5.3) and (5.4) by the set (#,, --- , %). Now, the strata 
are independent, i.e. the k pairs (u; , ;) are independent, and it follows from the 
calculations in Section 5 that the conditional distribution of an individual u, 
given Z; is normal with mean y; = @,(6 + v,;) and variance 6v,(@ + v,), 
where v; = (nz;' — N7')o;. Thus, for given # , --- #,, the conditional distribu- 
tion of T = > au, isnormal with mean >- ayy; , and variance }~ ai6’v,(6 + v,)~. 
We thus conclude that the Bayes estimate for T is 


(6.16) 5,(z) = 2 a yi = 2 a; PE(F + v)™, 


and as the variance of the conditional distribution of T is independent of z, the 
Bayes risk is 


k 


(6.17) Ct 7 ai Cv + v;)~ + . Cin; . 


t=1 i=l 


To find a minimax estimate for T now, we consider if the sequence {rg} tends 
to a limit as 6 — o. It will be seen that it does, and the limit r is given by 


k k 

. 2 

r= lime = > atv + Deni 
6+ i=l t=—1 


(6.18) 


iol Nin; qonk 


All we have to do now is to find some estimate 6* for which the risk does not 
exceed r, and by Theorem 2.2, if any such 6* exists, it will be a minimax estimate 
for given n;. Trying 6*(x) = > a€;{ = lime... 59(2)), we see that the risk cor- 





BAYES AND MINIMAX PROCEDURES 
responding to 6* is given by 


k 
R(w, 5*) E.(8* = T) + > Ci ni 
tl 
(6.19) . 4 
= E, [> a(@; — u) | + Den. 
tl tml 
Noting the fact that the strata are independent and utilizing the result (5.6) for 
a single stratum, it is seen at once that (6.19) reduces to 
k 
Ye -s 
6.20 R(w, 6*) s Bh tome o5 
cm Rain s Eas 
Hence the usual estimate > a:@; is a minimax estimate for given n, . 

Minimaz strategy for choosing the n;. We now choose the n; so that the mini- 
max risk for given n; and largest allowed variances in the strata is minimum 
under the restriction that the n; are positive integers and <N;. This risk is 
given by 

Me une FSO 
6.21 = 47° ~~ = ‘ i My i]. 
(6.21) r > [e (2 re + em 
This expression differs from that in (6.10) by a quantity which is independent 
of m, M2, °-*, ™%, and hence is minimized by the same n; as before provided 
n; S N;, and otherwise by n; = N;. 

As a special case, consider the problem of the estimation of the overall mean 
of the finite population, » = N~* "4.4 >-74, 2;;, where N = >“4_, N;. Choos- 
ing a; = N;/N, T = os au; = p, and we see that a minimax procedure of esti- 
mating the mean u is to choose the n; as 


. 
(6.22) n; = integer nearest to of Mist + j and <N,, 
Ci 


and then employ the usual estimate N eb N&;, for these n;. This rule for 
finding the minimax n; is more exact than the one commonly stated in the 
literature, namely the allocation of the total sample size in proportion to 
N.o:/Vc;. This greater exactness may be quite useful in the case of high c,’s. 


7. Acknowledgement. I wish to express my gratitude to the late Professor 
M. A. Girshick, who encouraged me to work on this problem and gave generous 
help and guidance. Thanks are also due to Professors David Blackwell and 
Herman Rubin for several helpful discussions during the course of this work. I 
also take this opportunity to record my appreciation of some discussions with 
Professor John Pratt which helped to clarify some of the concepts during the 
preparation of this paper. 


REFERENCES 


[1] Davip BLacKwELt AND M. A. Grrsuicx, Theory of Games and Statistical Decisions, 
John Wiley and Sons, New York, 1954. 





218 OM P. AGGARWAL 


[2] W. G. Cocnran, Sampling Techniques, John Wiley and Sons, New York, 1953. 

[3] W. E. Demina, Some Theory of Sampling, John Wiley and Sons, New York, 1950. 

[4] M. H. Hansen, W. N. Hurwitz, anp W. G. Mapow, Sample Survey Methods and Theory, 
Vol. Il, John Wiley and Sons, New York, 1953. 

[5] J. L. Hopass, Jr., anp E. L. Leumann, “Some problems in minimax point estimation,’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 182-197. 

(6) E. L. Leamann, Mimeographed notes on the Theory of Estimation, University of Cali- 
fornia, Berkeley, Chap. IV, 1950. 

(7] P. C. Manatanosis, “On Large Scale Sample Surveys,” Phil. Trans. Roy. Soc. London, 
Ser. B, Vol. 231 (1944), pp. 329-451. 

(8) P. V. Suxsatme, Sampling Theory of Surveys with Applications, Iowa State College 
Press, Ames, 1954. 





AN APPROXIMATION USEFUL IN UNIVARIATE 
STRATIFICATION 


By Gunnar EKMAN 
Institute of Mathematical Statistics, University of Stockholm 
1. Introduction. The problem of minimizing a sum >-/ Po, , where P; and 
a}, denote the area and conditional variance in the interval (2,_; , 2») of a densit y 
f(x), arises in the theory of optimum univariate stratification (see Dalenius, 


{1]). In [1] Dalenius shows that the sum > Po, is minimized when the con- 
ditions 


(1) oh + (x, — wn)” aa ons + (Lr — brs)” 

ch Th+i 
are fulfilled, (h = 1,--- (nm — 1), m = —*, 2, = +), where uw, denotes 
the conditional mean in the interval (x,_; , Za). 

In order to avoid the computational difficulties presented by determining 
{z,} such that the conditions (1) are satisfied, various approximations to (1) 
have been proposed. A brief summary of these results is given in [2]. In this 
article a new approximation will be derived and numerical examples will be 
given. 

We shall show, that under certain conditions’ and for a density over a finite 
range, points {z,} satisfying the equalities 


a)” (2p = Zn) Pr = C. . h = 1, 2, P22. 


where C, is a constant dependent on n, approximately satisfy the minimal 
conditions (1). For a density over an infinite range the above is obviously not 
applicable, in which case, however, a certain modification can be made, sub- 
stituting the conditions (12), (13) below for (1). The basic result will be de- 
rived under the assumption of a large n, i.e. correspondingly small intervals 
(x, — Za-1); asymptotically as n approaches infinity the conditions (1) and 
(1) will be proven equivalent. In practice, of course, n is often rather small, 
hardly greater than 4 or 5, which certainly does not fulfill this requirement of a 
large n. It is still possible that even for n = 2, 3, etc., the points satisfying 
(1) provide a good approximation to (1) in the sense of a >>? Pro, near the 
minimal value. In order to ascertain whether this may be the case or not, ac- 
tual numerical computation has been carried out for three densities and for 
n = 2, 3, 4, and 5. In the table under “Numerical Examples’”’ the points {z;! 
obtained by applying (1) or the substitute conditions (12), (13), may be 


Received February 28, 1958; revised July 3, 1958. 

1 These conditions, which concern primarily the regularity of the density function f(z), 
are imposed in order to facilitate the mathematical derivation of the final result, and- 
should be of no interest or concern to the reader desirous only of applying this result in 
practice. 


219 





220 GUNNAR EKMAN 


compared with the points satisfying the actual minimal conditions (1); below 
is given a comparison of the respective [>> Proi|’: 





S(x) = 21 — x) | f(z) = «7 f(x) = xe* 
n Se epeegpsancnnshanygpeamasncneenenas | 
am | (1) | wm | ao | we (1) 
2 0.01517 P 0.01505 0.2856 | 0.2855 | 0.6420 | 0.6370 
3 0.00693 0.00688 0.1333 0.1332 | 0.3090 0.3075 
4 0.00395 0.00393 0.0769 0.0768 | 0.1811 0.1804 
5 0.00255 0.00254 0.0500 0.0500 | 0.1189 0.1185 











It therefore seems not unlikely that the conditions (1) are suitable substitutes 
for the exact conditions (1); the differences in the above table are seen to be 
comparatively small and decrease both absolutely and percentagewise as n 
increases. It is nevertheless appropriate to mention a general type of density 
for which (1) may be expected to give rather poor results, namely such f(z) 
with a long (finite) “tail” (for example a x’-distribution with a large number 
of d.f.) or with f(z) = 0 in an end point. In this case it is advisable to use (12), 
(13) unless these are difficult analytical expressions; in the case of an infinite 
“‘tail’’ this is imperative. 

The determination of points {z,} satisfying (1) or (12), (13) is by no means 
entirely free of computational difficulties, although these points are of course 
considerably easier to find than those satisfying (1). Trial and error or various 
iterative methods may suffice in some cases, but the need for a more systematic 
approach arises even for comparatively convenient expressions for f(z). It is 
apparent that with a knowledge of the constant C, the determination of the 
points becomes trivial, as they may then be found simply by “fitting” with the 
aid of a table over the distribution function. An iterative method of finding C, 
is described in the paragraphs ‘First approximation to the constant C,’’ and 
“Adaption to Numerical Computation’. The mathematical arguments leading 
to (1) following in the next paragraph are rather simple and straightforward, 
involving only the use of Taylor series and elementary algebraic manipulation. 
An outline of a method of finding the points satisfying the exact minimal con- 
ditions (1) concludes the paper. 


2. Approximation over a finite range. Consider a density f(x) over a finite 
range (xo, Zn). We assume that the derivatives f’(z) and f”(x) exist and are 
continuous over the whole range. We introduce a function H(x) by defining 


H'" (xz) = f(z), 
H"(z) = [ H’"(t) dt, 


H’(z) 


[. H” () dt, 


H(z) 


[. H’(t) dt. 





UNIVARIATE STRATIFICATION 221 


H'(x) --- H(z) exist therefore and are continuous. One finds by partial 
integration: 


6 
[ a'f(x) dx = 2{H(b) — H(a)] —2[bH’(b) — aH’(a)] + [0 H”"(b) — aH” (a)), 
b 
[ 2f(@) adc = -1H'®) — H'@) + H"@) — aH”), 
| s@) ax = (H"® - H"@). 
By substituting the expressions in the right hand members of these equalities 
in 


Prin =f 2f(a) ae, 


Fh-1 


Pylok + wil = [- a°f(x) da, 


where P;,, wa, and o, are the area, conditional mean and conditional variance 
of f(x) in the interval (2,_; , za), we obtain by some calculations’ 


Palo + (ex — w)'] = 2 {H (a) wi [ Him. +@=nru) 


(a, — )” ” 
+S a |S. 


Piltn — ua) = (A’(an) — [A (ana) + (ta — tar) H” (an-)]}, 
= [H”(21,) — H” (x-,)). 


We note that the right hand members consist of H(2z,), H’(z,) and H” (x) 
minus the first terms in their respective Taylor expansions about the point 
x = 2,1. Continuing these expansions, the following results are obtained: 


3 
hls -~ abe 2[ H’" (ay_1) 
(2) 


(xa 


+ {es Aca H® (x i) + (wa nS He) | ’ 


Pilz, — ma) = [S5=. HH" (a,-1) + G eee H® (ay_ 1) 


(a, — Zant)’ or 
+ a = epee H® He) | ? 
(xp — Tr—1) 


P, = E socal H"' (2,1) + = (2, = tr)" H® (2y_1) 


1! ~ BT 
+ (2a mail H” (&; |, 


(4) 


Th 
2 These identities may also be obtained by partially integrating / (zx — z)*f(x) da, 
fh-1 
k = 0,1, 2. 





222 GUNNAR EKMAN 


where £; are points in the interval (x,_; , z,). By multiplying (2) by (4) and 
subtracting the square of (3), the following identity is obtained: 


(5) [Pr onl = Ser a) {HH (ay) (H’”" (ana) + (rp —= tns)H™ (x41) + R,} . 


Squaring both members of (2) we obtain: 


6 
Phoh + (a, — wi)" = Se Feed {14 (as-s)LE” (an) 
(6) (4) 
Fie) (2, 


R, and R, are terms of the second order or higher in (x, — 2,-.). For large 
n, that is for small intervals (x, — x,_;), these terms may be neglected, intro- 
ducing thereby on'y a slight error. Substituting f(z) and f’(x) for H’’(x) and 
H (2), and dividing (6) by (5), the following approximative identity is ob- 
tained: 


E + (xm — al as 4(x, — Zr) 
on 3 


+ — 2r1)]) + Ra}. 


(7) ’ 
f(r) (Zn — Tr) +o (a, — 2-1)" ; 


F (ana) + f'(@r-1) (Za — Tr-1) 


In the numerator of the second factor of the right hand member of (7) we 
have the first two terms of the Taylor expansion of P, about the point x = 
1, Whereas the denominator is the partial derivative of the numerator with 
respect to 2, , i.e. the first two terms of the expansion of f(z,) in the same point. 
Approximating once again by neglecting terms of greater order in the expan- 
sions of P, and f(z,), we obtain finally 


(8) Ek + (a, — a) ‘oe A(x, ve Tr) Py 
on 3f (xn) : 
Proceeding in entirely the same fashion the analogous expression 
ort + (t, — Mrs) ; A(Xn41 — tr) Pri 
(9) SS ee 
Th+1 3f (xn) 


may be derived. 
Applying these results to the identity (1), one obtains 


(10) (a, — 2ra)P, ~ (Tr41 - tn) Pror 7 

Applying (10) finally with h = 1,---, (n — 1), we find an approximation 
to (1), namely 
(11) (tr — Xr-ar)Pr = Ca, 


where C, is a constant. 





UNIVARIATE STRATIFICATION 223 


We have tacitly assumed f(x) = 0; the case f(z) = 0 will usually present itself 
only for z = zo or x = 2, , in which case (12), (13) below may be used instead 
of (11). When f(z) # 0, and assuming without loss of generality a range of unit 
length, we see from (7) that the first neglected term in the numerator is 
O(t» — xn)*, which implies the same degree of approximation in (8), (9). 
Accordingly the square roots of the members of (8), (9) differ by terms of the 
third order. This in turn implies both that the partial derivatives of >> Pron 
(being in fact (f(xx)/2) {for + (an — wa)*V/on — [ons + (rn —wn4s)*]/onyi}) are 
O(xn — Xr1)® + O(2n41 — 2a)* in the points satisfying (11), and that the approxi- 
mate {z,} must be adjusted by D>? O(a, — an1)* = O(2; — 2,1)? = Kx in 
order to satisfy (1), since the members of (1) are of the first order. We deduce 
from these results and from the expansion to >> (a,xKi + biKiKiys) of S>Pion 
about the points satisfying (11) that the approximate and true minimal values 
of the sum differ by a sum of n terms each = O(z; — 2;,)‘, that is by 
O(x; — 2;.)'. The conditions (11) thus generate points {z,} differing from the 
true minimal points by terms of greater order than the interval lengths and 
result in an approximate minimal > P,o, differing from the true value by a 
term of greater order than the sum itself, ie., O(z; — 2;.); these differences 
should furthermore decrease monotonically as n increases, that is as (x; — 2j-1) 
decreases. These conclusions are immediately extended to comprise even the case 
f(z) = 0 in the end points, and are therefore valid for any reasonable density 
over a finite range. By a truncation argument we may immediately ascertain the 
asymptotic equivalence of the sums, when (12), (13) below are used to approxi- 


mate (1), even over an infinite range, whereas in this case the two sets of points 
do not necessarily converge, as may be seen by taking f(z) = e~. 


3. Approximation over an infinite range. If = —~©,2z, = +, the ap- 
proximation (11) can still be applied for h = 2,--- , (n — 1). The identities 
(8) and (9) with h = 1 and (n — 1) respectively suggest putting 
(12) (t. — tra)Pr= Ca, h =2,---,(n — 1), 


2 2 2 2-2 
(13) “a [2 + S ~ a] Bf es E + Ses — Hn) ] ce. 


whereby an approximation over the infinite range is obtained. The functions in 
the left hand members of (13) depend only on the variables x; and z,_; respec- 
tively and are often convenient analytical expressions, e.g. for f(z) = e¢~* 


3f (Zt) [set Ges oT = 3ee 


’ 


4 


on 


The result (11) may be compared to some analogous results discussed in [2]. 
In [5] it is proven that P,o, = C, gives an approximate solution to (1). Now 
under the assumption of small intervals (2, — 2,1), f(z) may be approximated 
by a constant f(£) in this interval (2,1 S & S 2x), in which case 


“= (2p se trs)/V12. 





224 GUNNAR EKMAN 


Therefore this result [5] and (11) are substantially equivalent, each one follows 
from the other. (11) might seem advantageous from a computational point of 
view. From a theoretical point of view we have at the same time derived (8) 
and (9), which permits (12) and (13) to be used in some cases mentioned in the 
introduction, where both (11) and the result in [5] give comparatively poor 
results. 

In [4] it is shown that [z, — 2,1] = C, also gives {z,} approximating (1). 
This result follows from (7) above, if there in the series expansions we neglect 
even the second terms, obtaining then 


2 2 
mt f= al siieal ve (rp — 2-1) ’ 
and the analogous result for the right hand member of (1). It is therefore rea- 
sonable to assume that Aoyama’s result gives a poorer approximation (at least 
this is so asymptotically) than (11) or the result in [5], although computation- 
ally, of course, (x, — 2s1) = C, is better than both of these; it has neverthe- 
less the disadvantage of being restricted to a finite range. 

The approximation Py, = C, has been proposed. We see from (11) that 
asymptotically this would imply (x, — 2,-1)/ua = a constant, so that the interval 
lengths would increase with h, whereas the P, necessarily decrease. This so- 
called principle of equipartition might therefore be of use when dealing with 
decreasing f(x), e.g. of exponential type. 


4. First approximation of the constant C,. With a knowledge of C, the set 
{z,} satisfying (11) or (12), (13) may easily be found, which set then approxi- 
mately satisfies (1). The necessity of at least an approximation to C, arises. 

We shall derive an approximation of C. under the assumption that C,_, has 
already been obtained; this will be done by the following heuristic argument. 
Suppose that for a density with a finite range (x, — 20) a set of points 2, ---, 
Zn; has been found such that the relations (11) are satisfied (there is always a 
unique such solution, as can be seen immediately). The left side of the identity 


“(nh — Tra nC, 
14 et 2 = <a 
Oo” (a=) * Gn = %) 
may be considered as a weighted mean either of the (2, — 2-1)/(tn — o) 
weighted by the P, or vice versa. We noticed above that Aoyama’s approxi- 


mation (%, — 2,1) = C = a constant is asymptotically correct when all terms 
but the first are neglected in (5) and (6), so that for large n 





(an — Xn1)/(Xn — 20)] ~ 1/n, 


and (14) becomes 





UNIVARIATE STRATIFICATION 


that is: 
(15) ¢, is 
As a first approximation C’, to C, the expression 
1 (n-71r)°C.~ 

(16) C._ Soe 
may therefore be used. 

Of course other first approximations C’, might be conceived of. There is for 
example some logic in adjusting C’, as given by (16) by the factor C,_,/Cn—+, 
thereby obtaining 


272 
anes Cl, = M1) Come 


n Cu» 


5. Adaption to numerical computation. The above results (11) and (12), 
(13), together with (16), (17) suggest a reasonably simple method of finding a 
set of points {z,} which approximatively satisfy the identities (1), that is, which 
approximately minimize the sum )-f Pyo, for a civen density. The method to 
be described is applicable to the case of a finite range; in the general case the 
identities (13) are used instead of (1, — 2o)P: = (tn — 2n-1)Pa = Ca. 

Assuming that the set {z,} satisfying (11) has been found for some n = n — r, 
(e.g. for n = 2, 2, may be found by trial and error), a first approximation to C, 
is obtained by (16) or (17). A set {z,} may be found such that (in general) all 
(ch — t,-1)P but one are equal to C,,. Let us assume that (2; — 2;-1)P; ¥ C’, 
(e.g. j = n). Then a second approximation C’%, to C,, may be obtained by putting 


(n = 1)Cn + (2) — 241)Pj_ 


n 


(18) C= 


Proceeding in the same manner a set {z,} such that all (2; — xs_1)Px but 
oneareequal to C’, may be found, and a new approximation C’, ’ to C, is obtained 
by an analogous formula to (18), etc. The Cc‘? thus obtained converge to C, , 
and the sets {x{"’} correspondingly to the set {2,} satisfying (11). 


6. Numerical examples. The method described above has been applied to 
three densities, whereof two are over a semi-finite range. For n = 2 the point 2; 
has been found by trial and error. As first approximations C’, , (16) with r = 1, 
has been used in all cases but one (f(z) = ze*, n = 4), where (17) with r = 1 
was used. The points {z,} have been found to three decimals; as a rule a com- 
parable degree of accuracy will not be necessary, at least for the first approxi- 
mation. 

The results are summarized in table I. The results under Min. are the points 
satisfying the exact minimal conditions (1), with reservation for the last decimal, 
which may differ by a unit from the true value. The exact minimal variances are 
given in the last column. 





226 GUNNAR EKMAN 


We note that both f(x) = 2(1 — xz) and f(x) = ze™ are densities of the type 
mentioned in the introduction as being not quite suitable for application of 
equations (1); here better approximations can be obtained by using (13) for 
h = n in the first, h = 1 (and of course h = n) in the second case. 


7. Note on the method used in obtaining the exact minimal points. The 
method described above gives in many cases a fairly good approximate solution 

































































TABLE I 
Number , | | | 
of in- \APProxi- a | se | as wrisee@ } I Cc (2Pren)? 
tervals | ™#ton | | | 
See | es | sine 
2 | 1. | 0.382 | 1.000 | 0.01517 
min. | 0.354 | 1.000 | C; = 0.2361 | 0.01505 
3 1 | 0.244 | 0.528 | 1.000 | Cy = 0.1049 | 0.00692 
| 2 | 0.242 | 0.531 | 1.000 | | Cy’ = 0.1030 0.00693 
| min. | 0.230 | 0.503 | 1.000 | Cx" = 0.1029 | 0.00688 | f(z) 
: 4 1 | 0.178 | 0.379 | 0.613 | 1.000 | Ci = 0.0579 | 0.00395 = 2(1 
2 | 0.177 | 0.376 | 0.615 | 1.000 Cy = 0.0573 | 0.00395 = — 2) 
min. | 0.171 | 0.361 | 0.588 | 1.000 C,” = 0.0573 | 0.00393 
5 1 | 0.141 | 0.294 | 0.466 | 0.668 | 1.000 | Cs = 0.0367 | 0.00255 
2 | 0.140 | 0.292 | 0.463 | 0.669 | 1.000 | Cs’ = 0.0364 | 0.00255 
min. | 0.136 | 0.283 | 0.448 | 0.644 | 1.000 | Ci’ = 0.0365 | 0.00254 
2 1 1.233 | | 0.2856 
min, | 1.262| « C2 = 0.8737 | 0.2855 
3 1 | 0.742) 2.045) @ | C; = 0.3883 | 0.1334 
2 | 0.766) 1.991) = | C;’ = 0.4095 | 0.1333 
| 3 | 0.763 | 1.997 | = Cy" = 0.4071 | 0.1333 
min. | 0.764 | 2.026| «© | Cy" = 0.4071 | 0.1332 
4 | 1 | 0.545 | 1.295 | 2.572) C, = 0.2292 | 0.0769 | f(z) = 
| 2 | 0.553 | 1.317 | 2.547 | « Cy’ = 0.2349 | 0.0769 eo 
| min. | 0.551 | 1.315 | 2.577 | = Cy" = 0.2345 | 0.0768 
5 | 1 | 0.430 | 0.977 | 1.730 | 2.995 | « Cy = 0.1501 | 0.0500 
2 | 0.433 | 0.985 | 1.748 | 2.982 | « | Cy’ = 0.1522] 0.0500 
| min. | 0.431 | 0.982 | 1.746 | 3.008 | = | Cs” = 0.1521 | 0.0500 
2 1 |2.125| © | | | 0.6420 
min. | 2.291; © | | Cz = 1.3319 | 0.6370 
3 | 1 | 1.423 | 2.976) © Cs = 0.5920 | 0.3125 
2 | 1.467 | 3.109 | | Cy’ = 0.6328 | 0.3092 
3 | 1.472/3.123| © | | Cy" = 0.6368 | 0.3090 
| min. | 1.571 | 3.252| @ | | | Cy" = 0.6371 | 0.3075 
4 | 1 | 1.175 | 2.304] 3.949) © | | Cy = 0.3859 | 0.1809 | f(z) = 
| 2 | 1.157 | 2.250 | 3.885 | oo | Cy = 0.3720 | 0.1811 zen? 
| min. | 1.234 | 2.324 | 3.915 | © | Cy" = 0.3696 | 0.1804 
5 | 1 | 0.955 | 1.785 | 2.792 | 4.277 | « Cy = 0.2365 | 0.1191 
| 2 | 0.960 | 1.796 | 2.814 | 4.327) «© | Cy’ = 0.2396 | 0.1189 
| 3 | 0.961 | 1.798 | 2.818 | 4.336 | © | C,” = 0.2401 | 0.1189 
| min. | | 1.850 | 2.876 | 4.425} «© | C;’” 


= 0.2401 


0.1185 





UNIVARIATE STRATIFICATION 227 


to the problem of minimizing >> Pyo, , from the point of view of generating 
points {z,} fairly close to the true minimal values. When this is the case, the 
“usual” iterative procedure of finding successively better approximations con- 
verging to the true values may be employed, which method will be briefly re- 
viewed for completeness. 

To this end denote by A, the left hand member of (1), By4: the right hand 
member, that is: 


4 _ ah + (ty — ma)” 
A, S— 


oh 


, 


2 2 
Bu = 7M! + (re — pry) 
Midas Teeniagyyglanngtto 
+1 


The minimal conditions (1) then take on the form 
An — Bray = 0, h=1,---,(n— 1). 


The conditions (1)‘” or (12), (13) have resulted in approximative values of these 
expressions, which may be denoted by Ax”, Bre. 

Taylor expansions of A, and B,4, about ah”,, zi”, and af”, ai, respectiv. ely, 
have then the form 


Ay (242) + Kia, 2h” + Ki) = An? + (4) Kia + (54 is) Ki, 
Oxn-1/ 0 


Buys(xy” + Ki » Teen + Kiss) = BY + 3) Ki + (Ge) Kay, 


Ox) OFn41 


if terms of second order or higher are neglected, and where the subscript 0 de- 
notes the value of the partial derivatives in x,”’, ete. We should now attempt 
to find {K,} so that these expressions equalize, that i is, we should solve the sys- 
tem of linear equations: 


dA, dAn\ _ aBett) | i (32) 
(24 *),K now | (34 ), ( Ox, Jo ms OFn41 Kans 


= BYyi-— AM, h=1,---,(v—1). 


(a) 


whereafter K, is added to xj”’ to obtain the new approximative xz, . The matrix 
of this equation will be >0 if the first approximation is good, since this matrix is 
definite positive in the exact minimal points. The system may be solved in a rela- 
tively simple manner by first finding for example K, by Cramer’s formula, and 
by then successively determining K,, K;, --- from the equations for h = 

2, --- . It may be noted that the matrix is a so called continuant matrix, and 
has a simple recursive form discussed in the literature. Equations (a) are of 
course used iteratively, (if necessary), once the first set of {K,} has been found, 
ete, 








228 GUNNAR EKMAN 


It remains to find the expressions for the partial derivatives appearing in the 
equations (a). We first multiply both numerator and denominator of A, by Ps. 
From the expressions for Pi[o; + (a, — ua)’] and Pilz, — ua] in the function 
H and its derivatives, we find immediately 


2 [Px (ok + (an — ua)*l] = 2Pr [an — ual, 
Lh 
and by simple derivation 
— [Pr lok + (zn — wad") = — (tn — tra)” f(t) - 
Thi 
Keeping in mind that the original minimal conditions were derived from 


APron) f(a) 
OXp~1 Ke 2 B 


AlPronl _ f(z) 
—_ os 


we then find by simple computation (the procedure is analogous for By4:): 


GA, _ f(x) [4 re a 2 
dz, Pi on oa (xp tra) |, 


ods ial Eze — pm) — S(zs) As] ’ 














az, Pror 2 


OBrys ee 1 i S(@r) p2 
Siig aes ones E Paya(@a — bys) + 3 Bia | 

OBuyr _ f(a) G - a)? — 4m Pos — — Sat) | PAny 
OFn41 Paaonl f(x) Ox, 


These expressions involve only the functions A, , etc. themselves, and ys , 
on , (2s), etc., which have already been calculated to obtain A, , etc., and are 
therefore not difficult to compute. 

Using equations (a) on the approximate values found by (1) for the three 
densities above, the exact values (with reservation for the third decimal in a 
few cases) were found with only one application of (a). This would seem to imply, 
that no greater error is introduced by neglecting second order terms and higher 
in the expansions of A, , B,4: , and that application of the above method on the 
points satisfying conditions (1) very often provides a reasonably simple and 
systematic method of finding points satisfying the minimal conditions (1). 


(b) 








Acknowledgements. I stand in appreciation to Docent Tore Dalenius of 
the Institute of Statistics, University of Stockholm for stating the problem and 
allowing me to examine his article [2] before it appeared in print. 

I am also indebted to the referee and to Mr. J. M. Hammersley for helpful 
suggestions in the revision of the original paper. 





UNIVARIATE STRATIFICATION 


REFERENCES 

{1] T. Datenrus, ‘The problem of optimum stratification,” Skand. Aktuarietids., (1950), 
pp. 203-213. 

[2] T. Datenius anp J. L. Hopes, Jr. “Minimum variance stratification,” J. Amer. 
Stat. Assn., (1958), (submitted in December 1957). 

[3] T. Datenrus anp M. Gurney, “The problem of optimum stratification II,” Skand. 
Aktuarietids., (1951), pp. 133-148. 

[4] H. Aoyama, “A study of the stratified random sampling,” Ann. Inst. Stat. Math., 
(1954), pp. 1-36. 

{5) T. Dauentus anv J. L. Hopems, Jr. ‘The choice of stratification points,’’ (1958), 
(Soon to appear in Skand. Aktuarietids.). 








TRUNCATION AND TESTS OF HYPOTHESES' 
By Om P. AcGarwa.L?* anp Irwin GuTTMaAn* 
Purdue University and Princeton University 


1. Summary. This paper examines the loss of power when using tests based on 
the assumption that the variable being sampled has a “complete” normal dis- 
tribution when in fact the distribution is a “truncated’’ one. The cases consid- 
ered here are for small sample sizes and “symmetric” truncation, while the 
hypothesis considered is the one-sided testing for the mean of a normal distribu- 
tion. Some tables are computed and it appears that an appreciable loss occurs 
only in the size of the test. The loss in power is found to decrease very rapidly 
with the distance of the alternative value of the mean from the one tested and 
also with the distance of the truncation from the mean. 


2. Introduction. In sampling from a normal distribution the assumption that 
the random variable X is defined over (— “©, ©) is an unrealistic one, and “‘a 
sample of n from a normal distribution”’ is in reality a sample of n from a ‘‘trun- 
cated”’ normal distribution. This problem has been dealt with from variaus points 
of view in several recent papers (see references). However, one aspect that seems 
to have been neglected is that of the tests of hypotheses. We shall attempt to 
examine the results of applying some usual tests of hypotheses to the case when 
the available sample is known to have come from a truncated population. 

We call a normal distribution ‘symmetrically truncated’ at the ‘terminus point’ 
a if its density is given by 


ay t= Tu exp [—}(x — u)*/o'l, for|x — | < ae, 
P o 


- 


= 0 otherwise, 


where c is given by 


‘ l —_ 1 F —t2/2 
(2.2) om [ fas 


We shall confine our attention to the problems of symmetric truncation only, 
with a and o known. 


Received January 14, 1958; revised June 30, 1958. 

' Research commenced while both authors were at the University of Alberta, under 
grant from the University of Alberta General Research Fund. 

? Research continued at the Summer Research Institute of the Canadian Mathematical 
Congress at Kingston. 

*At present on leave from Purdue University and with the F.A.O. of the United 
Nations in Chile. 

* Research continued on support by the Office of Naval Research under Contract Nonr 
1858 (05) 


230 





TRUNCATION AND TESTS OF HYPOTHESES 231 


3. Distribution of sample means. Suppose a sample X,,--- , X, of size n 
is available from a distribution of the form (2.1). The sampling distribution of 
X = 1/n>"}.. X; for arbitrary n is very complicated and no general formula 
giving the distribution of X explicitly is available. However, by using convolu- 
tions of distributions, it is quite easy to derive the distribution of X for small 
values of n. The results for n = 1, 2, 3 and 4 are given below where without loss 
of generality » = 0, o = 1. The density function of X is denoted by f,(z). 

Case n = 1. From (2.1) the density is given by 


c 

—= exp (—2’*/2), for |z| <a, 
(3.1) fila) = ( V2" \2| 
0 otherwise, 
where c is given by (2.2). 

Case n = 2. Using convolution on (3.1) we obtain 

; Vila—izi) _ is 
Vx ce [ e* ? dt, 
(3.2) falz) =< * 
0 otherwise. 

. Convoluting (3.1) and (3.2) it can be verified that 


Jb! ine fa eer ta e a 
( 


2x?/2 —V/6/4) (a—z) 


for |z| <a, 


w+) Oe du 


a 


forO0 s | «| = 5? 


ett) at du, 


2x82 


(3.3) f(x) V 60° er ale? Te Ei: i 
( 


—/6/4) (a—z) 0 
for 5 < |z| <a, 
0 otherwise. 


. Applying the convolution law to the density (3.2) it is found 


4 2(a—|21)  pa/F(a—|(ul2)+2|) fr/2(e—| (u/2)—2]) 
4 —2z2 
—ce 

| ™? 6 


- e uttete) dw dv du, for|z| <a, 


(3.4) fi(z) 


{0 otherwise. 


For sufficiently large n, Birnbaum and Andrews [1] have pointed out that nX 
has a limiting normal distribution. Thus for large n one may obtain an approxi- 
mate cumulative distribution of X from (4.2) in [1]. However, in this paper we 
shall confine our attention to only those cases where n S 4. 


4. Tests of hypotheses under truncation. In this section we consider the effect 
of truncation on size and power of tests of hypotheses concerning the means 
of parent populations. 


























ssoT % 


ssoyT % 
T 

d 

“d 











90°0 =» lof (yng fo aboyusosed sp passaidxa sanod ut sso] pun ( ny (o ‘Nd (nq fo sanjD 4 | 


0. P. AGGARWAL AND I. GUTTMAN 








1‘@- 2'0- 80 ° : et? ¢°0 
1100°— | Z100°— | 0200" 2200" “=| 0600°— | 1200" 
8166" 6689" GLSZ" , fs : ; £126" SOF 
2066" L889" S682" , , f : £216" VOrP" 


2'0- ‘ 6°0 ; " 9°0- ZT 
6100°— | £000°— | 6100° . : . TS00°— | 2400" 
$196" oses* 191Z° ; i 7 ; SPES" S098" 
9996" Lye" O8TZ° . ; : P L628" L¥9E" 


1o-| 10 o'r ; 90-| 61 
6000°— | $000" | 2100" ; ‘= | ¥h00"— | 900" 
gees" | 20h" =| StZt ’ * | gzg9° | @ble" | 9zF0" 
L1ss’ | 2808 ' * | Tes9" | 9622" | 00g0" 


ig 
a 
c 
5 
a 
E 
oa 
° 
é 
a 
& 
a 
a 
< 
Z 
° 
= 
S 
a 
Pp 
= 
& 


t0o- £°0 3 vz 9°0- 9°0- r0- | 20 a3 PIT 
4000°— | 4000° : Z100° | 0900°— | 2800°— | 6200°— | 8000° 0F00" | 2900° 
169° 88S" : 88h0° | F886 SLI6" £0EL" OThr FIST” | &htO° 
S682" ; 0080" | ¥286° £216" PLZL* verP PSST" | 0080" 


_ _ — /- —_ 
0 sue 00°e set | (Ost | so | oo 























234 O. P. AGGARWAL AND I. GUTTMAN 


Consider a sample of size n from a normal distribution N(u, 1). Then a Uni- 
formly Most Powerful (UMP) test of the one-sided hypothesis testing problem 


(4.1) Avy = wm, Alt:u > wo, 
is given by (we assume without loss of generality that uo = 0), 
(4.2) Reject H if X > Z./+/n; _ accept H otherwise, 


where Z, is the point exceeded with probability a using the distribution of the 
standard normal variable. Now, if sampling from N.(u, 1) where N,(u, 1) is 
the density (2.1) with o = 1, and test procedure (4.2) is used, the predetermined 
size a of this ‘usual’ test is really not obtained. The actual size is given by a’ = 
Pr (Z, > Z./*/n), where Z, is the random variable with density function 
f,(x) of the last section. 

Now, the ‘usual’ power function of the test (4.2) is given by 


P.(u) = Pr {X > Z./-Vn| X ~ N(u, 1/n)} 
Pr {Z > Z. — uV/n|Z~ NO, 1)}, 


if sampling is from a ‘‘complete” normal distribution. However, if the sampling 
is from a truncated distribution, N.(u, 1), the actual power function of the ‘usual’ 
size a test is given by 


P(u, a) = Pr {X > Z./Vn| X ~ f(a, u)} 


= Pr {Z, > Z./Vn — w| Zi ~ falz)}, 
where f,(x, 4) is the density of X when sampling from N,(u, 1), and Z, is the 
random variable with density f,(z) = f,(x, 0). 
We denote the difference of (4.4) and (4.3) by 


(4.5) L(u, a) = P.(u) — P(u, a). 


For » = 0, L equals a — a’, while for all other values of yu, L is the “loss of 
power” if the usual procedure is followed, while sampling is actually from a 
truncated distribution. Values of P,,(u), P(u, a), and the loss in power expressed 
as percentage of P.(u) for different values of u and four terminus points ‘a’ 
are given in Table I for a = 0.05 and n = 1, 2, 3 and 4. 

It can be easily verified that (4.5) reduces to 


5 Za = , 1 


- pri —-t> 5 nt ~ 4,0}, 


(4.3) 


ll 


(4.4) 





(4.6) 


and by graphical considerations one may see that L(y, a) and Z./+/n — u 
have the same sign. Thus, as soon as u¢ exceeds Z,/+/n, there will be a change of 
sign from positive to negative in the loss of power, L(u, a). This is borne out by 
the actual computations in Table I. 





TRUNCATION AND TESTS OF HYPOTHESES 


TABLE II 
Upper 100 a&% points of f,(x) 


— 


«1 
_ 
© 


-693 
-813 
-875 
- 898 
.636 
-871 
.028 


gERUESEES 


.150 
731 
O11 
.204 
315 
365 
821 ; . 
155 843 
396 1.006 
5A | 1.100 
611 1.143 
870 751 664 
238 1.049 922 
515 | 1.262) |. 104 
687 | 1.392 1.212 
775 | 1.485 1.263 


Be 
nw 
bt 


en 
8S 
<1 
© 


© © 
3 © 
wo 








SZ3588 





1, 
1 
2 
2 
3. 
1. 
1 
2 
2. 
3. 
1 
1 
2 
2 
3 
1. 
1. 
2. 
2. 
3 
1 
1 
2 
2 
3 


Snuonscounonscounounscouousouons 





Now, suppose the sampling is from N,(u, 1). By applying the Neyman-Pear- 
son Fundamental Lemma, a UMP test of (4.1) of size a is 


(Reject H if X > K.(a, n), 


(4.7) 
| Accept H otherwise, 


where K,(a, n) is the point exceeded with probability a using the distribution 
whose density is f,(x). Table II gives the significance points for the test (4.7) 
for different n, a and a. That is, if sampling from a truncated normal distribution, 
(4.7) gives the ‘correct’ test for problem (4.1), and Table II gives the correct 
significance points for this problem. 

The power of this ‘correct’ test (4.7) is given by 


(4.8) P(u) = Pr (Z, > K.(a, n) ae a), 


where Z, is the random variable with density f,(x). The gain in power, G(y, a) = 
PAu) — P(u, a), is the gain that would result if one uses the correct test rather 
than the usual test. The values of P.(u), G(u, a) and the gain in power expressed 





236 O. P. AGGARWAL AND I. GUTTMAN 









































TABLE III 
Values of P.(u); G(u, a) and gain in power expressed as percentage of P(u, a) 
for a = 0.05 
Mecuaskvswune 1.5 2.0 
n 
le aa Bag 3 5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 3.0 
1| P. .1929| .3968| .6246| .8238| .9601/ .3008 .5117| .7108] .8646| .9576 
G .1245| .1744) .1911) .1637| .1095| .0618| .0721| .0655| .0464| .0257 
% Gain |182.0 (78.4 44.1 24.8 |12.9 |24.9 (16.4 |9.2 (5.7. [2.8 
323i ?. .2508| .5923) .8761) .9880) 1.0 .4828| .7684) .9393) .9935) .9999 
G .1460| .2084) .1472) .0458) .0016) .0823) .0695) .0325) .0071| .0002 
% Gain |126.4 54.3 20.2 4.9 0.2 20.5 9.9 3.6 0.7 —_ 
3| P. .3199| .7450, .9684| .9997| 1.0 .6220} .9010| .9905| .9999/1.0 
G .1666| .2001) .0723) .0046) — .0836| .0444) .0085) .0003) — 
% Gain |108.7 (36.7 | 8.1 | 0.5 — |15.5 |5.2 |0.9 ae 
| | 
| 
4| P, 3836] .8468) .9931] 1.0— | 1.0 | .7305| 9611] .9988]1.0— |1.0 
G | .1859} .1676, 0267} 0002) — | .0766| .0237| .0016) — | — 
% Gain | 94.0 [24.7 | 2.8 - — 11.7 |3.6 | 0.2 aie | ~ 
iP vchsscos 2.5 3.0 
nm aaa a a a iN at i it a tertile 
re .75 1.50 2.25 3.00 3.75 5 1.0 2.0 3.0 | 4.0 
1| P, .1958| .4625| .7475| .9256| .9906| .1276| .2627| .6436| .9152| .9924 
G .0144; .0209| .0172) .0081) .0022) .0025) .0039) .0045) .0018) .0003 
%Gain| 7.9 |4.7 12.4 |0.9 |02 |2.0 |1.5 |0.7 |0.2 _ 
2| P. .2086 .7124] .9534] .9983) 1.0 .1773|  .4156| .8873| .9965/1.0 
G .0244; .0249| .0078 .0005 — 0048; .0074) .0047| .0003) — 
%Gain| 8.9 |3.6 |0.8 one — |28 |1.8 |0.5 ~— | = 
3] P. .3881] .8558; .9933| .9990/ 1.0 | .2221| .5431| .9689| .999911.0 
G 0276, .0210| .00165 — — | .0060) .0081) .0014; — —_ 
%Gain| 7.7 |2.5 | 0.2 she a PGS ee tes ~ 1 = 
4| P, .4703| .9320,  .9992/ 1.0— | 1.0 .2644| 6477} | .9922}1.0— [1.0 
G . 0300) -0107) .0003)} — — | .0069)| .0078 .00045 — _ | 
% Gain | 6.8 | 1.2 | —| —| — jar jae -: kay bon 











as percentage of P(u, a) for different u, a and n and for a = 0.05 are given in 
Table III. 


5. Conclusions. An examination of the tables indicates that serious losses occur 
in the size of the test rather than its power. For example, if the truncation occurs 
at 1.5 times the standard deviation on either side of the mean, and a usual 5% 
significance test is used, one is really using approximately 1% significance test 
rather than 5%. If the truncation occurs at twice the standard deviation on 





TRUNCATION AND TESTS OF HYPOTHESES 237 


either side of the mean, the usual 5% significance test gives only approximately 
3% significance level. Thus one consequence of applying the usual test is to 
err on the conservative side in making it much more difficult to reject the hy- 
pothesis. As expected, however, when the truncation is at about three times the 
standard deviation on either side of the mean, there is hardly any difference 
between the usual and the correct test. Even when the truncation occurs at less 
than twice the standard deviation away from the mean, there is not much change 
in the value of the power function beyond one standard deviation away from the 
value of the mean specified by the null hypothesis. Hence it would appear that 
unless there is severe truncation and unless the alternative value of the mean is 
quite near the value specified by the null hypothesis, the usual test would be 
satisfactory. The results given here are only for a usual 5% significance level 
test. It is proposed to give extensive tables of the distribution of the mean of 
samples from truncated distributions and to examine the tests at other than 5% 
significance levels in another paper. 


REFERENCES 


{1] Z. W. Binnpaum, anv F. C. ANpREws, ‘On sums of symmetrically truncated normal 
random variables,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 458-461. 

(2] Francis L. CamMpBeE.u, “‘A study of truncated bivariate normal distributions,’’ Doc- 
toral Dissertation, University of Michigan, (June, 1945). 

[3] Dovucias G. Cuapman, ‘“‘Estimating the parameters of a truncated gamma distribu- 
tion,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 498-506. 

[4] A. C. Conen, Jr., ‘On estimating the mean and standard deviation of truncated nor- 
mal distributions,’ J. Amer. Stat. Assn., Vol. 44 (1949), pp. 518-525. 

[5] A. C. Conszn, Jr., ‘Estimating the mean and variance of normal populations from 
singly truncated and doubly truncated samples,’’ Ann. Math. Stat., Vol. 21 (1950), 
pp. 557-569. 

(6) A. C. Conn, Jr., ‘Estimating parameters of Pearson type III populations from trun 
cated samples,’”’ J. Amer. Stat. Assn., Vol. 45 (1950), pp. 411-423. 

[7] A. C. Conen, Jr., “Estimation of parameters in truncated Pearson frequency dis- 
tributions,’’? Ann. Math. Stat., Vol. 22 (1951), pp. 256-265. 

[8] A. C. Conn, Jr., ‘Estimation in truncated bivariate normal distributions,’’ Uni- 
versity of Georgia, Mathematical Technical Report No. 2, Contract DA-01-009- 
ORD-288, (June, 1953). 

{9} A. C. Congn, Jr., “Estimation in truncated multivariate normal distributions,”’ 
University of Georgia, Mathematical Technical Report No. 3, Contract DA-O1- 
009-ORD-288 (August, 1953). 

[10] A. C. Consgn, Jr., ‘Restriction and selection in samples from bivariate normal dis- 
tributions,’’ J. Amer. Stat. Assn., Vol. 50 (1955), pp. 884-893. 

{11] A. Cirrrorp ConHeEn, Jr., “Restriction and selection in multinormal distributions,”’ 
Ann. Math. Stat., Vol. 28 (1957), pp. 731-741. 

[12] A. C. Conen, Jr., AND JoHN Woopwarp, “‘Tables of Pearson-Lee-Fisher Functions of 
singly truncated normal distributions,’’ Biometrics, Vol. 9 (1953), pp. 489-497. 

[13] WaLter L. Deemer, anp Davin F. Votaw, Jr., ‘‘Estimation of parameters of trun- 
cated or censored experimental distributions,’’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 498-504. 

[14] GeorcE GERARD DEN BroepeEr, ‘‘On parameter estimation for truncated Pearson type 
III distributions,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 659-663. 











238 0. P. AGGARWAL AND I. GUTTMAN 


{15) D. J. Finney, ‘“The truncated binomial distribution,’’ Ann. Eugenics, Vol. 14 (1949), 
pp. 319-328. 

[16] V. J. Francis, ‘On the distribution of the sum of n sample values drawn from a trun- 
cated normal population,’ J. Roy. Stat. Soc. Suppl., Vol. 8 (1946), pp. 223-232. 

17) A. Haxp, ‘“Maximum likelihood estimation of the parameters of a normal distribution 
which is truncated at a known point,” Skand. Aktuarietids, Vol. 32 (1949), pp. 
119-134. 

[18] Harotp HorTe.uina, ‘Fitting generalized truncated normal distributions,’’ Abstracts 
of Madison meeting, Ann. Math. Stat., Vol. 19 (1948), p. 596. 

[19] Karu Pearson, ‘‘On the influence of natural selection on the variability and correla- 
tion of organs,’’ Philos. Trans. Roy. Soc. London Ser. A, Vol. 200 (1963), pp. 1-66. 

[20] Des Ras, “Estimation of the parameters of type III populations from truncated 
samples,’ J. Amer. Stat. Assn., Vol. 48 (1953), pp. 336-349. 

[21] Des Rag, ‘‘On estimating the parameters of bivariate normal populations from doubly 
and singly, linearly truncated samples,’’ Sankhya, Vol. 12 (1953), pp. 277-290. 

[22] Waxrer L. Smirn, “A note on truncation and sufficient statistics,’ Ann. Math. Stat., 
Vol. 28 (1957), pp. 247-252. 

[23] Jonn W. Tuxey, “Sufficiency, truncation and selection,’”’ Ann. Math. Stat., Vol. 20 
(1949), pp. 309-311. 

[24] Joun W. Tuxey, ‘‘The Truncated Mean in Moderately Large Samples,’’ Memorandum 
Report 32, Statistical Research Group, Princeton University (1949). 

[25] D. F. Voraw, Jr., J. A. Rarrerty, anp W. L. Deemer, “Estimation of parameters in 

a truncated trivariate normal distribution,’’ Psychometrika, Vol. 15 (1950), pp. 

339-347. 








NOTES 


BARTLETT DECOMPOSITION AND WISHART DISTRIBUTION 
By A. M. Ksurrsacar 
University of Bombay 


1. Introduction. In a recent paper Wijsman [1] has presented a method of 
deriving the Bartlett decomposition and the Wishart distribution by using or- 
thogonal matrices depending upon certain random vectors. The method used 
in the paper was simple compared to other methods in literature. The present 
paper gives a similar method, depending on orthogonalization of vectors, and 
is simpler and direct. This method explicitly gives the x* variables and normal 
variables in the decomposition of the Wishart distribution in a straightforward 
way. 

The device of writing the Wishart matrix as a product of a triangular matrix 
and its transpose has been used before; see [4] and [6]. In [5], this is done and it is 
shown that the elements of this matrix are independent chi and normal variables. 
However, the present method seems simpler in that it leads to the variables via 
transformations rather than via densities and Jacobians. 


2. Notation and results. The same notation as in Wijsman’s [1] paper is used. 
Let zu (¢ = 1, --- ,k;t = 1, --+ , n) be independent N(0, 1) variables, forming 
the k X n matrix Mi, , the ith row of which is denoted by X; (i = 1, --- , k), 
and let 
(1) Ain = [aij] = Mia(Min)’. 

Obviously a;; = X;X; (i,j = 1, --- , k). Orthogonalizing the vectors X,, --- 
X;, , we get vectors 

(2) ¥; = Xi — baYi — ba¥e — ++ — bisaYen t=1,-+-,k 
where b;, (¢ = 2,---, k; r = 1,---, 4 — 1) are so chosen that 

(3) YiY; = 0, t # J; i,j=1,---,k. 

It is therefore easy to see that bi, = Y,X./Y-+Y,. Let 

_ Sf italia ow Sueiieh ieee 
(4) bi _ (Y! Y,)'2 — (Y, Y,) bi, . 


On account of (3), it can be seen that 


i—1 
X:X.= Y;:¥i + J, 0h 
r=] 


i—1l 
Xi X; = } bi by + b)(Yi Y¥)? for j >i 
r=l 


Received February 4, 1958. 





240 A. M. KSHIRS. GAR 


This can be written, in matrix form as 


(6) Ain = BB; i=1,---,k, 
where B; is the triangular matrix 
(Y; ¥,)"" 0 0 ve 0 

bn (¥2¥s)"" 0 tee 0 
(7) ba be (Y¥3¥3)"? --- 0 

ba bia bis ++ (¥,¥)"” 

Taking determinants on both sides of (6), we have 

(8) |Afal = (Y1¥1)(¥2¥2) --- (Yi¥,) fm pod hp 
and 
(9) viv, = Lee, ee he 


where by convention Aj, = 1. 

Consider the conditional distribution of bi, (r = 1, --- ,%— 1) when X;, ---, 
X;_; are fixed. The b;, are i — 1 orthogonal (on account of (3)) linear functions 
of the independent N (0, 1) variables ta , ta, +--+ , tin and 





E(bie) = 0 
ay. 
Vibs) = Y'Y.~ 1. 
Therefore the b;, (r = 1, --- , i — 1) are independent N(0, 1) and by Fisher’s 


lemma (See [2]), 
i—1 
Yi Ys = XiXi— Di din 
is a x’ variable with n — (i — 1) degrees of freedom. This is independent of the 
normal distribution of b;, . Since the fixed variables X, , --- , X;. do not appear 
in the distribution of Y;Y; and bi, (r = 1, --- , i — 1), and since the result is 
true for every i, we get the following result: 

Yi¥; (¢ = 1, --- k) are independent x’ variables with n — (i — 1) degrees of 
freedom, and bi, (¢ = 2,---, kj r= 1,+++, 4 — 1) are independent N(0, 1) 
variables, and all these variables are independently distributed. 

The density factor in the joint distribution of the 4k(k + 1) variables YiY; 
and b;, is therefore 


k i-1 


_ k 
(10) Cin Exp | -3 - t,| IT ((Y¥;¥)” ~ >" ap [-3Y' YJ} 


to2 rel 





BARTLETT DECOMPOSITION 


where 


_ gnki2_k(b—1)/4 n—-i+1 
(11) Cin = 2? IIr gees). 
iat 


This density factor, can be written, on account of (5) and (8), as 


k 
Wf Cun exp [—4 tr Ain] | Ain |” TT (Yi; Y®?”. 
i=] 


If we transform from the variables Y;Y; (i = 1, --- , k) and bj (i = 2, --- , k; 
1,---,%— 1) to the k(k + 1)/2 distinct elements a;; of Ai,,, by using 


the transformation as given by (6), the Jacobian of transformation is (see [3] 
Theorem 4.1) 


k 
Y; Y; (i—k)/2 
IT (vi ¥) 


Consequently the density factor in the distribution of the elements of A;,,, is 
(13) CinlAin|” ” exp [—4$ tr Ain]. 


Remarks. The $k(k — 1) normal variables, k x’ variables and the triangular 
matrix in the Bartlett decomposition, as mentioned by Wijsman [1], are directly 
given by (6). 

Further, from (8), (9) and the distribution of Y;Y; as derived above, it fol- 
lows immediately that 


(i) vis | | are independent x’ variables 
i—ln 


with n — (i — 1) degrees of freedom (i = 1, --- , k). From this, the results of 
Lemma 2 and 3 in Wijsman’s [1] paper follow very easily. 
The author is very grateful to the referee for his valuable suggestions. 


REFERENCES 


{1] R. A. Wissman, “Random orthogonal transformations,’’ Ann. Math. Stat., Vol. 28 
(1957), pp. 415-422. 

[2] H. Cramér, Mathematical Methods of Statistics, Princeton 1946, page 379. 

[3] W. L. Deemer anv I. OLxErn, ‘“Jacobians of certain matrix transformations used in 
multivariate analysis,’ Biometrika, Vol. 38 (1951), pp. 345-367. 

[4] P. C. Manauanosis, R. C. Boss, ann 8. N. Roy, ‘‘Normalization of statistical variates 
and the use of rectangular co-ordinates in the theory of sampling distributions,”’ 
Sankhyd, Vol. 3 (1937), pp. 1-40. 

[5] J. G. Mauxpon, “Pivotal quantities for Wishart’s and related distributions and a 
paradox in fiducial theory,”’ J. Roy. Stat. Soc., Ser. B, Vol. 17 (1955), pp. 79-85. 

[6] I. Orx1n, “On distribution problems in multivariate analysis,’’ Inst. of Stat. Mimeo 
Series, North Carolina, Vol. 43 (1951), pp. 1-126. 





242 RAY MICKEY 


SOME BOUNDS ON THE DISTRIBUTION FUNCTIONS OF THE 
LARGEST AND SMALLEST ROOTS OF NORMAL 
DETERMINANTAL EQUATIONS' 


Ray MiIckEy 


Iowa State College” 


While the joint density function of the roots of certain determinantal equations 
have been obtained, [1], [2], [3], the result is sufficiently complex that the mar- 
ginal distribution functions of these statistics have not, to the author’s knowl- 
edge, been tabulated. We present here a lower bound on the distribution func- 
tion of the smallest root and an upper bound on the distribution function of the 
largest root. 

These bounds may be of possible usefulness in problems of significance tests 
since observed values that are not “significant”? according to the bounds will 
certainly not be “significant” with respect to the exact distribution. 

Let Si; and Si;, i,j = 1,---, k, be two sample covariance matrices from 
normal distributions having identical covariance matrices. It is well known [4] 
that the smallest and largest roots, say W; and W, , of the equation 


\Si; — WSi,;| = 0 


satisfy the inequalities 


yi i 

W, < » Sij by Lj 
= ¥2 

> Sij Xj 2; 


Let F; = Si; / Si: . It then follows that 


lA 


Wi, 7 Si;x;2; > 0 


W, S min {F,}; W, = max {F;}. 


Since the roots are invariant under linear transformations of the underlying 
variab'es, the covariance matrix may be taken to be the identity matrix. Then 
the F, are independently and identically distributed according to the well known 
F distribution. Denote by Fy; and Fy; the smallest and the largest of a set of k 
jndependently identically distributed F values. We then have the desired bounds. 


P{W, Ss u} = P{Fy S u} 
P{W, = v} -_ P{ Puy = v}. 


Denote by G(F) the distribution function of F (which depends, of course, on 
the numbers of degrees of freedom for S}; and Sj,;). The above bounds become 


Piw,s uy z>1-—-fi-—- Gw} 
P{W. = v} = 1 — [G(o)I. 


Received April 28, 1958. 
1 Prepared in connection with work under AEC contract AT (30-1) -1377. 
2 Now at General Analysis Corporation, Los Angeles, California. 





SINGLE SERVER PROBLEM 


REFERENCES 


[1] R. A. Fisuer, ‘‘The sampling distribution of some statistics obtained from non-linear 
equations,”’ Annals of Eugenics, Vol. 9 (1939), pp. 238-249. 

(2] P. L. Hsu, ‘‘On the distribution of roots of certain determinantal equations,’”’ Annals 
of Eugenics, Vol. 9 (1939), pp. 250-258. 

[3] A. M. Moon, “On the distribution of the characteristic roots of normal second- 
moment matrices,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 266-274. 

[4] S. N. Roy, “p-Statistics, or some generalizations on the analysis of variance appro- 
priate to multivariate problems,’’ Sankhyd, Vol. 3 (1939), pp. 341-396. 


a 


NOTE ON A MOVING SINGLE SERVER PROBLEM!’ 


By 8. Karun, R. G. Miter, Jr., anp N. U. Prasau 


Stanford University and Karnatak University 


1. Introduction and summary. B. McMillan and J. Riordan in [1] derived 
the generating function for the probability distribution of the number of items 
completed before absorption in a moving single server problem in two special 
cases. Through an analogy to the work of L. Takdcs [2] on busy period prob- 
lems for a simple queue, McMillan and Riordan postulated a nonlinear integral 
equation relation for the generating function. In this note the validity of this 
relation is proved in general by exploiting the analogy more fully, and the 
generating function in the two special cases is obtained directly from the in- 
tegral equation. A similar functional relation is established for the Laplace- 
Stieltjes transform of the distribution of time until absorption, and the trans- 
form is obtained for the two special cases. 


2. Functional relations. As stated by McMillan and Riordan the moving 
single server problem is the following: an assembly line moving with uniform 
speed has items for service spaced along it. The single server available moves 
with the line while serving and against it with infinite velocity while transferring 
service to the next item in line. The line has a barrier in which the server may be 
said to be “absorbed” in the sense that service is disabled if the server moves 
into the barrier. The server with exponentially (a) distributed service time starts 
service on the first item when it is 7 time units away from the barrier. Let the 
spacings between items be independent random variables with the general dis- 
tribution function B(t). 

This problem is analogous to a simple queue with a single server, Poisson ar- 
rivals (A = a), and distribution of service times F(t) = B(t). The time until 
absorption in the moving single server problem is equivalent to the length of a 


Received April 28, 1958. 
1 This paper represents work done independently by Karlin and Miller and by Prabhu, 
and combined because of the similarity of the results. The work of the first two authors 


was sponsored in part by Stanford University Office of Naval Research Contract Nonr- 
225 (28). 





244 8. KARLIN, R. G. MILLER, AND N. U. PRABHU 


busy period for the simple queue in which the service distribution for the first 
item in line is 


Fx(t) = ‘c , 12° 
The number of items completed before aan is one less than the number of 
items serviced during a busy period in which the first item has the service dis- 
tribution F,. 

Let p(k, T) be the probability that the server completes k items before ab- 
sorption, and let P(z, T) = > io p(k, T)x*. Let f; be the probability that j 
items are serviced during a busy period in which the first item has service dis- 
tribution F's , and let F(z) = > Safa’. 

For the queue suppose that n items arrive during the service period (of length 7’) 
of the first item. The probability distribution on the number of items serviced 
during the remainder of the busy period is the n-fold varddeganeree of f = {f;}. 
Hence, 


WA V 


a 


> SP" rer 


n=0 


(1) P(x, T) 


—aT (1—F(z)) 
= @ 9 


where F(z) is defined above and is the unique analytic solution to the integral 
equation 


(2) F(z) = 2 | , Sere awe, lz|< 


subject to the condition F(0) = 0 (see [2]). Since the integrand in (2) is P(z, t), 
P(z, t) is the unique analytic solution to the non-linear integral equation 


(3) P(z,T) = exp{—af (1 — 2 . P(z, t) aBio)}, 


subject to the condition |P(z, t)| S 1 for |x| < 1, all ¢. This is the integral rela- 
tion conjectured by McMillan and Riordan. 

Let H(u, T) be the probability that the server is absorbed prior to time u, 
and let A(s, T) = fo e“ dH(u, T). Let G be the distribution of the length of a 
busy period in which the initial item has the service distribution Fs , and let 
G(s) = fo e&™ dG(u). 

The length of time until absorption is 7 plus an n-fold convolution of busy 
periods where n is the number of items arriving in the interval (0, 7). Hence 


| oo e "(aT)" 
(4) H(u) = {»=0 n! 
\0, u 


G”(u — T), “oe T', 


lA 


T, 





SINGLE 6ERVER PROBLEM 245 


where G” denotes the n-fold convolution of G. In terms of Laplace-Stieltjes 
transforms (4) becomes 


(5) A(s, T) = exp {—T(s + a(1 — G(s)))}, 


where G(s) is the unique analytic solution to the relation 


(6) G(s) en [ ge tal-Ge))e dB(t), 


subject to the condition lim,.,. G(s) = 0 for real s (see [2]). Combination of 
(5) and (6) implies that A(s, t) is the unique analytic solution to the equation 


(7) A(s,T) = exp {-e7 — aT ( _ I A(s, t) ap} , 
subject to the conditions |A(s, t)| S 1 for Re{s} > 0, allt and lim,.. A(s,t) = 0 
for real s, all ¢. 


3. Examples. 


(a) B(t) = Sy 


P(x, T) can be determined either from (1) and (2) or from (3) directly. To 
determine P(x, T’) from (3) let T = «€ in (3). 
(8) P(x, 6) = er ee-ePe) 
so P(z, e) satisfies the equation 
(9) aetP(z, de"? = acre. 
The expansion of e'‘ for z = aexP(z, ¢) (see [1]) is 


(10) fo ) de Dore eae 


—aev\k 
k=l 


(aexe 


so 
(11) P(x, T) = Pe coal + . T(T + ke) Ye (aye-*" bk 


A(s, T) can be determined from (11), from (5) and (6), or from (7). For the 
latter method let 7 = « in (7). 


(12) aeH(s, ee atte. = acer? 


: TH(s,®) ; 
so the expansion of e*7““"® is 


~ o k-l 
(13) Pe = |] > > eae et & (ace *“*)* 


and 





246 A. M. KSHIRSAGAR 
fA Ty & green Fe PE OOS pee cag yncgveneiny, 
kewl 
(b) Bit) =1-—€", t20,6>0. 


To determine P(z, T) from (3) integrate both sides of (3) with respect to dB(T) 
and solve for f¢ P(z, t) dB(t). 


(15) .: P(z, t) dB(t) = i 
sO 
(16) P(z,T) = xp{—5 (a—- 6+ ~VJ(a + BP — Tai) 


To determine A(s, T) from (7) integrate both sides of (7) with respect to 
dB(T) and solve for f¢ A(s, t) dB(t). 


a7) AG,» aBy = Be Et ek 6 EX ET a 


sO 


(18) Als, T) = exp{—5 (sta-—-B+Vist+at BP ta) 


REFERENCES 
{1] B. McMituan anv J. Rrorpan, “‘A moving single server problem,’’ Ann. Math. Siat., 
Vol. 28 (1957), pp. 471-478. 
{2} L. Taxes, ‘Investigation of waiting time problems by reduction to Markov proc- 
esses,’’ Acta Mathematica, Acad. Scient. Hung., Vol. 6 (1955), pp. 101-128. 


a 


DISTRIBUTION OF THE “BLOCKS ADJUSTED FOR TREATMENTS” 
SUM OF SQUARES IN INCOMPLETE BLOCK DESIGNS 


By A. M. KsxrrsaGar 
Bombay University 


Introduction. Marvin Zelen [1] has stated that the distribution of the ‘“‘blocks 
adjusted for treatments” sum of squares in an incomplete block design is un- 
known. The present paper is intended to derive this distribution. 


Notation and derivation. Let there be v treatments and b blocks having 


ky, ke, +--+, ky plots respectively and let the ith treatment be replicated r; 
times; (i = 1, 2, --- , v). Let ni; (which is either zero or one) be the number of 
times the ith treatment occurs in the jth block, (¢ = 1, 2,---v;j = 1, 2,--- 
b). Then 


Received May 5, 1958. 





DISTRIBUTION IN INCOMPLETE BLOCK DESIGNS 


N= [nj] 
xb 


is the incidence matrix of the design. Let the total yields of the blocks be de- 
noted by B, , B,, --- , By respectively and the total yields of the treatments by 
T,, Tz, +--+, T, respectively. Let 


[ B 


1 
| 2 


B= 


LB, 
We shall assume that the yield of a plot consists of a general effect u, the effect 


of the block containing the plot, the effect of the treatment which is applied to 


the plot and the error component. Let a; denote the effect of the jth block, and 
let 


It is assumed that the errors are independently and normally distributed with 
zero mean and variance o, the block effects are independently and normally 
distributed with zero mean and variance o” and that the block effects and error 
components are independently distributed of each other. 

Let FE, , Vi, cov; denote respectively the expectation, variance-covariance 
matrix and covariance matrix, in the conditional distribution of the yields when 
the block effects are fixed; and #, V, cov will denote the same quantities, in 
the absolute distribution of the yields when the block effects are also normally 
distributed. Let 


P=B- diag (*, + i: 2) 1, 
1 12 Te 
where diag stands for a diagonal matrix, the elements in the diagonal being 
written in the adjoining bracket. P is the vector of the block totals, adjusted 
for treatment effects. Let 


D = diag (ki, kz, +--+ , ke) — N’ diag (*,* oes 
1 


2 


Then it can be readily seen that 
E\(P) = Da, ViA(P) = oD. 


Since the sum of all the elements in any row (or column) of D is zero, one 
of its latent roots is zero. If the design is a connected one (i.e., all the treatment 





248 A. M. KSHIRSAGAR 


contrasts and block contrasts are estimable), the rank of D is b — 1. Let the 
non-zero latent roots of D be \;, Ax, °*: , Axa, the corresponding orthogonal 
normalized latent vectors (column vectors) being m, m,--+-, mp1. Then 
(D — ri1)m; = 0, i = 1, 2,---, 6b — 1; and hence 


E\(m{P) = miDa = mia, i=1,2,---,b—1; 
VilmiP) = o'miDm; = or, i=1,2,---,b-—1; 

cov; (m:P, m;P) = o’m:Dm; (t # j) 
a @ i,j=1,2,---,b—1. 


Hence the ‘‘blocks adjusted for treatments,” sum of squares with b — 1 degreeS 
of freedom is 


b-1 
u= >> (m; P)*/d;. 
t=! 


When the block effects a; are not fixed but are random variables obeying a nor- 
mal distribution with zero mean and variance o”, the means, variances and co- 
variances of m;P can be found as below by using theorems 14 and 15 about ex- 
pectations, variances and covariances proved in [2]. 


V(m{P) = (EV: + VE;)m,P 
= E(o°d:) + Viadmia) 
= or + ori, i=1,2,---b-1; 
cov (m; P,m; P) = cov {E,(m; P), E:(m; P)} + E{cov: (m; P), (m; P)} 
= cov (A; m; a, Aj mM; a) (i ¥ j) 
= 0, i,j = 1,2,---b-1; 
and 
E(m;P) = E{E,(m;P)} 
= E(Aymia) 
= 0, $=1,2,---,b—1. 
Therefore, if 
m; P 
then 2, %2,°**, Zs. are standard normal independent variables and the 


(adjusted) block sum of squares is 


b-1 
ai 2 
a Lat, 


i=l 





SYMMETRICAL GROUP DIVISIBLE BLOCKS 


where 
a; = 20° + ro”) 


The distribution of such a quadratic form has been derived by Herbert Rob- 
bins [3], and Herbert Robbins and E. J. G. Pitman [4] (Theorem 1). 

For a design, the dual of which is a balanced incomplete block design, all the 
non-zero latent roots of D are equal and the distribution of u reduces to a chi- 
square distribution. 

When the design is not a connected one, some of the \’s will be zero and the 
necessary changes can be easily made to suit that situation. 

I am indebted to Prof. M. C. Chakrabarti, Mr. B. V. Shah of the Bombay 
University and the referee for valuable suggestions in the preparation of this 
paper. 

REFERENCES 

{1] Marvin Zeuen, ‘‘Analysis of incomplete block designs,’ J. Amer. Stat. Assn., Vol. 
52 (1957), p. 204. 

{2} M. H. Hansen, W. N. Hurwitz, anp W. G. Mavow, Sample Survey Methods and 
Theory, Vol. 2, John Wiley and Sons, New York, pp. 63-65. 

[3] Hersert Rossrns, ‘The distribution of a definite quadratic form,’’ Ann. Math. Siat., 
Vol. 19 (1948), p. 266. 

[4] Hersert Ropsrns, anv E. J. G. Pitman, ‘“‘Application of the method of mixtures to 
quadratic forms in normal variables,’’? Ann. Math, Stat., Vol. 20 (1949), p. 552. 


or 


A SERIES OF SYMMETRICAL GROUP DIVISIBLE INCOMPLETE 
BLOCK DESIGNS 


By D. A. Sprorr 
Waterloo College, Ontario, Canada 


Introduction. A balanced incomplete block design (BIBD) is an arrangement 
of v elements in 6 blocks of k different elements each, so that each element occurs 
in r blocks and each pair of elements occurs in \ blocks. If v = b(r = k), the 
design is said to be symmetric; it is well known that any two blocks of a sym- 
metric BIBD have exactly \ elements in common. 

It has been shown [1] that the subspaces of dimension t of PG(m, p") form 
a BIBD. It is also known that the PG(m, p”) contains PG(m, p*) if k is a factor 
of n (see [2]). In particular, the lines of PG(2, s’) form the design 
veb=s'+s+ir=k= s’ + 1, \ = 1, which therefore contains the de- 
signv=b=s+s+1,r=k=s8+1,d = 1, (the lines of PG(2, s)), where 


n 


s=p". 


Received March 18, 1957; revised November 6, 1958. 





250 D. A. SPROTT 


A group divisible incomplete block design (GD design) has been defined [3] 
as an arrangement of v elements in b blocks of k different elements, in which 
the elements can be divided into m groups of n elements each, so that two ele- 
ments belonging to the same group occur together in \, blocks and two ele- 
ments belonging to different groups occur together in ), blocks. It will be shown 
that a series of GD designs can be obtained from the preceding series of BIBD’s. 

Series of symmetrical group divisible designs. Consider the BIBD (a): 
n=bh = 8 +8 + 1,71 =k = 8 +1,d = 1. Let set I be the set of blocks 
of (a) that contains the blocks of the BIBD (b): » = h = *§ +8+1,m= 
ke = s + 1, A = 1. Let set II be the remaining blocks of design (a); let set 
III be the set of blocks remaining in design (a) when all blocks and elements 
of design (b) have been deleted. Thus set I contains b, = s’ + s + 1 blocks, 
set II contains b, — b, = s*‘ — s blocks, and set III contains b = v = s‘ — 
blocks and elements. 

Lemma 1. Every element of set III occurs in exactly s° blocks of set III. 

Proor. Since (b) is a symmetric BIBD, any two blocks of set I have exactly 
one of the v, = s° + s + 1 elements in common. Thus the remaining (k, — ks) 
b, = s‘ — selements of set I are all different, and they are also different from 


the v, elements of design (b). Since s‘ — s = v, — v2 = v, where v is the number 
of elements in set III, each such element must occur in exactly one block of 
set I, and so must occur in exactly r, — 1 = s° blocks of set III. 


Lemma 2. Every block of set II contains exactly one element of design (b). 

Proor. Every pair of elements of design (b) occurs once in set I, and so cannot 
occur at all in set II. Therefore all elements of design (b) occur in different 
blocks of set II, each occurring r, — rz = s° — s times. Thus the number of 
blocks of set II containing exactly one element of design (b) is ve(r; — rz) = s* — 8, 
which is the total number of blocks of IT. 

Combining these lemmas, it can easily be seen that set III contains v 
elements in b = s* — s blocks, where each block contains k = k, — 1 
ments and each element occurs r = r; — 1 = s° times. 

Since any element of set III occurs once in set I with k; — k, — 1 other ele- 
ments of set III, it will not occur again with any of these elements and will 
occur once in set III with each of the remaining elements. Thus set III is a 
regular GD design with parameters, 


4 
8s —8s 
2 

8 ele- 


v=b=s —s8, r=k=s', i = 0, 
he = 1, m=s+8+1, n=s'— 8. 


Example. The lines of PG(2, 4) form a BIBD that contains the design formed 
by the lines of PG(2, 2). These designs have parameters 

(a): v = b = 21,r = k = 5, = 1, and 

(b):v = b 


7,r = k = 3, = 1 as shown below. 





ALTERNATIVE PROOF 


a ie eae ate. SS KKJJIGG 


shales Weare teberss 


ABFCJKGOSLMMLLMIHHIMK 
BFCJKGAPTPQNOQONQOPPN 
CJ KGABFQUTUSRSTTRUSRU 


The lower left hand group of blocks constitutes the design (b), and the lower 
right hand group of blocks is the GD design with parameters v = b = 14,r = 


k = 4, = 0, A. = 1, m = 7, n = 2. The groups are (D, EF), (N, R), (P, U), 
(Q, T), (L, M), O, 8), and (H, I). Thus, for example, D occurs zero times with 
E in the GD design and once with N, R, P, U, Q, T, L, M, O, S and I. 

A design with these parameters was obtained in [4] using the method of dif- 


ferences and is listed as number R24 in [5]. For s = 3 the resulting design has 
parameters, 


=k=9,%=0,%=1,m 


REFERENCES 
(1),R. C. Boss, ‘‘On the construction of balanced incomplete block designs,’’ Ann. Eugenics 
Vol. 10 (1939), pp. 352-399. 


(2) R. D. Carmicuakg., Introduction to the Theory of Groups of Finite Order, Dover Publica 
tions, 1956, pp. 334-336. 


[3] R. C. Bos, anp W.S. Connor, ‘“‘Combinatorial properties of group divisible incomplete 
block designs,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 
[4] R. C. Boss, 8. 8. SHrrkHANDE, AND K. N. Buartracuarya, “On the construction of 


group divisible incomplete block designs,’’ Ann. Math. Stat., Vol. 24 (1953), 
pp. 167-195. 


(5) ‘‘Tables of partially balanced designs with two associate classes,’’ Institute of Sta 
tistics, University of North Carolina, reprint series No. 50, p. 184 and p. 192. 


Cn 


ALTERNATIVE PROOF OF A THEOREM OF BIRNBAUM AND 
PYKE 


By Nicotaas H. Kutrer 


Landbo uwhogeschool, Wageningen, Netherlands 


Let U,, Uz,--+, Un be an ordered sample of a random variable (r.v.) X 
having a uniform distribution (0, 1). If 7* is the value of i = 1, 2, - 


, n at which 
i/n — U; is maximized and U* = U;., then U* is a r.v. with values (0, 1). The 


probability that the sample cannot be ordered or that <* is not uniquely defined 
is zero, and hence these possibilities are neglected. Theorem 3 [1] states that U* 
has a uniform distribution (0, 1). Another proof of this fact was given in [2]. 


Received June 30, 1958. 





252 PAUL R. RIDER 


In this note an alternative proof is given which entails little computation and is 
self-contained. 

Replace the interval (0, 1) by the reals modulo 1, considered as a circle of 
circumference 1. Let c be an arbitrary point on the circle. Moving from c in the 
direction corresponding to increasing values (0, 1), one meets successively the 
points Urs, Ursa, ---, Un, Ui,-++, Ue where k, so defined, is a r.v. de- 
pending on c. Rename these points Uj, U2, --- , U, respectively. Definet = i(j) 
by US = U;. Let uj denote the (arc) distance of Uj from ¢ taken in the in- 
creasing direction. Therefore, 


t 


k + j; uj = Unyj —e forj=1,---,n—k 


t=k+j—n; uj = Uijzntl—e forj=n—k+1,---,n 


With the indicated relation between i and j observe that 
j/n — uy = (i — k)/n — U5 +e =i/n — Uite — k/n. 


For a fixed c and a given sample, c and k are constants and hence j/n — uj 
attains its maximum at the same point U* = Uj; as does i/n — U;. 

Given a sample U,, --- , U,, the point U* on the circle of reals mod. 1 is 
therefore independent of the choice of the initial point c taken instead of 0 
on this circle. Since the distribution of X mod. 1 is uniform, that is, is invariant 
under translations, the distribution of U* mod. 1 is also invariant under transla- 
tions. Thus U* has a uniform distribution on (0, 1). q.e.d. 


REFERENCES 


[1] Z. W. Brrnpaum aND Ronatp Pyrkg, ‘‘On some distributions related to the statistic 
Dj,”’ Ann. Math. Stat., Vol. 29 (1958), pp. 179-187. 

[2] Merver Dwass, ‘‘On several statistics related to empirical distribution functions,”’ 
Ann. Math. Stat., Vol. 29 (1958), pp. 188-191. 


eR 


QUASI-RANGES OF SAMPLES FROM AN EXPONENTIAL 
POPULATION 


By Pau R. Riper 
Wright Air Development Center 


In a study of the use of ranges and quasi-ranges in estimating the standard 
deviation of a population, Harter [4] has compared the results for samples from 
a normal population with those for samples from certain other populations, in- 
cluding the exponential. In this note are given the distributions of quasi-ranges 
from the exponential population and also formulas for the cumulants of these 
quasi-ranges. 


Received June 23, 1958; revised October 17, 1958. 





EXPONENTIAL QUASI-RANGES 253 
Let 2, %,+-*+, 2, be a sample, and suppose that 2, S 2 S +--+ S am. 

The quasi-range of order r of this sample is the statistic 

(1) Wr = Lar — Lr+1 ’ 


wo being the range itself. 
The population to be considered is 


(2) f(z) = &, 


which has mean and variance each equal to 1. The cumulative distribution func- 
tion for this population is 


F(z) = I é*dr=1—eé”, 


and the probability that r members of the sample are below 2,4; , r above 2,_,, 
and the remaining values between z,,; and z,_,, is proportional to 


(1 ed e*rtt)'(e Fre aor Mee) ects Wray da ,41 ditn—r 


Replacing z,_, by 2,4: + w, , integrating with respect to 2,4, between the limits 
0 and , and supplying the proper multiplicative constant, we find the dis- 
tribution of the rth quasi-range, w, , to be 
T'(n — r) 
T'(n — 2r — 1)T(r + 1) 
Obviously we must have n = 2r + 2. 
Upon multiplying (3) by e‘* and integrating between the limits 0 and ©, we 
have ((2], p. 144) for the moment generating function of the rth quasi-range, 
Tn-nrr+i-i) "yy ( ) 
4 M(6 -: ———— > & a ae. 
(4) 0 (ir + 1m —r — 2b) Al, J 
To find the cumulants of the distribution (3) we note that 
n—r—l t n-—r—l1 © 1 t k 
(5) K(t) =n M(t) = — in(1 - £) = > ¥ + (5)- 
jerti J jerti kat k \G 


It follows at once that the cumulant of order p is 


(3) 


—w,\n—2r—2 — 
(1 = 2 wr) r 26 (r-+1) we dw,. 


n—r—l 1 


(6) ~=(p—)! DY =. 


jerti J” 
In particular, we have for the mean and variance respectively, 


n—r—l 1 n—r—1l 1 


(7) ni = > > ke = 


rs — 
j=rtl J j=rt+l J 


Thus the mean of the quasi-range w, , being equal to the sum of a harmonic 
series, diverges with sample size, although very slowly, while the variance ap- 
proaches a finite value. For the case r = 0, that is for the range itself, the vari- 





254 ROBERT A. WIJSMAN 


ance approaches the value x’/6 = 1.6449, and somewhat rapidly. For example, 
the variance of the range of samples of size 10 is 1.4977. 

For r = 0, the values of x; and x, approach 2.4041 and 2‘/15 = 6.4939 re- 
spectively as n becomes infinite. (Values can be obtained from tables of the Rie- 
mann zeta function, e.g. [3].) The ratios Pa and «,/x: approach 1.1395 and 


12/5 respectively. For a normal distribution these ratios are, of course, both 
zero. 


REFERENCES 


{1] J. H. Capwe ut, ‘‘The distribution of quasi-ranges in samples from a normal popula- 
tion,” Ann. Math. Stat., Vol. 24 (1953), pp. 603-613. 

{2} AnTHUR Erp& v1, et al., Tables of Integral Transforms, Vol. 1, McGraw-Hill Book Co., 
New York, 1954. 

|3) J. P. Gram, Tafeln fiir die Riemannsche Zetafunktion, Hgst & Sgn, Copenhagen, 1925. 

|4] H. Leon Harter, ‘“‘The Use of Sample Quasi-Ranges in Estimating Population Stand- 
ard Deviation,’’ Wright Air Development Center Technical Report 58-200. 

[5] Bensamin Epstein, ‘Estimates of mean life based on the rth smallest value in a sample 
of size n drawn from an exponential distribution,’’ Technical Report No. 2 (July 
1, 1952), prepared under Contract Nonr-451(00) [NR-042-017]. 

[6] Bensamin Epstetn, ‘Simple estimators of the parameters of exponential distributions 
when samples are censored,’”’ Ann. Math. Stat., Vol. 8 (1956), pp. 15-26. 

[7] SHozo Suimapa, ‘‘“Moments of order statistics drawn from an exponential distribution,”’ 
Reports of Statistical Application Research, Union of Japanese Scientists and 
Engineers, Vol. 4 (1957), pp. 153-158. 


ll 


ACKNOWLEDGMENT OF PRIORITY 


By Raps G. STANTON 


V. N. Murty has kindly pointed out to me that the result of my note, ‘““A Note 
on Balanced Incomplete Block Designs,”’ (Ann. Math. Stat., Vol. 28 (1957), p. 
1054), was given previously by K. Kishen and C. R. Rao in ‘“‘An Examination of 
Various Inequality Relations Among Parameters of the Balanced Incomplete 
Block Design” (Journal of the Indian Society of Agricultural Statistics, Vol. IV, 
No. 2 (1952), pp. 137-144). 


a RR a 


CORRECTION TO “RANDOM ORTHOGONAL TRANSFORMATIONS AND 
THEIR USE IN SOME CLASSICAL DISTRIBUTION 
PROBLEMS IN MULTIVARIATE ANALYSIS” 


By Rospert A. W1JsMAN 


In footnote 3 of the paper cited in the title (Ann. Math. Stat. Vol. 28 (1957), 
pp. 415-423), for x? read x. 





ABSTRACTS 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Monterey Meeting of the Institute, November 14-15, 1958) 


1. On Computing Expectations in Sequential Analysis. Frep C. ANpREws, 
University of Oregon, and J. R. Blum, Indiana University. 


Consider an arbitrary sequence of random variables X; , X:,--- , to be observed se- 
quentially and a corresponding sequence of statistics fi(X:), fe(X:, X:),---, 
Si(X1 , -++ Xj), --- each of the latter with zero expectations. With m denoting the number 
of random variables observed, determined by a sequential stopping rule, a necessary and 
sufficient condition that E(f,) = 0 for all truncated sequential stopping rules is that the 
sequence f; , f2 , --- be a martingale. Applications of this result is made to expectations 
of sums and products including a form of the fundamental identity of sequential analysis 
which is valid for unbounded stopping rules. 


2. Exact Nonparametric Tests for Randomized Blocks. Joun E. Watsn, 


Systems Development Corporation, Santa Monica, California. (By 
title) 


A class of nonparametric procedures for testing the statistical identity of treatments 
in randomized block experiments is suggested and discussed. The suggested procedures 
are squarely based on experimental within-block randomizations, and they may be chosen 
so as to have special power against particular alternatives. The blocks are assumed to be 
statistically independent but no assumption is made concerning the dependence within 
the various blocks. The basic idea is to obtain from each block a statistic that is, under 
the null hypothesis, symmetrically distributed about zero and then apply a nonparametric 
test of symmetry about zero. The observational data can be of any quantitative type. 


3. On the Determination of Joint Distributions from the Marginal Distribu- 
tions of Linear Combinations. Tuomas S. Fercuson, University of 
California, Los Angeles. 


Let Z, = a,X + 6, Y where 7. = a,/8, are all distinct for n = 1, 2,--- . A sufficient 
condition that the joint distribution of (X, Y) be determined uniquely by the distributions 
of the Z, is that there exist an integer m such that (1) EZ exp {t |Z, |} < © for some t > 0 
and (2) there is a limit point of the 7,’s (possibly + ©) other thany,, . Conversely when 
the joint distribution has a piece of positive continuous density somewhere, the distribu- 
tions of a finite number of the Z,’s do not determine the joint distribution. Thus in par- 
ticular the bivariate normal distribution is determined uniquely by any infinite collection 
of distinct linear combinations of the variables and by no finite number of them. These 
results extend to many dimensions, with suitable modifications. 


4. Approximation to the Probability Density of Zero—Crossings Intervals of a 
Gaussian Process. SyitvAIn EHRENFELD, New York University. (By 
title) 


Let z(t) be a stationary Gaussian process with a given spectrum w(f), and let Po(t) 
be the probability density of the lengths of intervals between successive zeros in this 
process. In the present paper several approximations to P 9(t) are obtained. This is achieved 
by the evaluation of multiple integrals whose evaluation are equivalent to finding 





256 ABSTRACTS 


‘“‘volumes”’ of hyperspherical simplices in n-dimensional space. The above geometrical 
problem is solved for certain cases (n = 3, 4) and given the desired approximations. The 
problem of finding the volume of a hyperspherical simplex reduces (for n = 4) to the prob- 
lem of finding the 3-dimensional volume of a tetrahedron-like figure (the analogue of a 
spherical triangle) on the surface of a 4-dimensional sphere. The final part of the paper 
is concerned with a comparison between the various approximations to a count of zero 


crossings, taken from a series of ocean pressure records. There also is a discussion of com- 
putational problems. 


5. The Probability in the Extreme Tail of a Convolution. Davin BLacKwELL 
AND J. L. Hopes, Jr., University of California, Berkeley. (By title) 


Let X,, X2,--- be independent identically distributed integer-valued random varia- 
bles, such that 0 is a possible value and the g.c.d. of possible values is 1. Suppose E(e‘*") 
is finite for some ¢ > 0. For any number a with E(X,) < a < sup X; there are a distri- 
bution p on the values of X; and a number m, 0 < m < 1 such that Prob {X; + --- + 
X, = na} = w%*(1 + O(n-*)) as n — © through those values for which na is a possible 
value of X; + --- + X,, where r* = m*(1 + ((us/u3) — 3 — (6u3/3u$))1/8n]/ov/2en, 
and o, u2, us, ua are the standard deviation and central moments of the p distribution. 


By ignoring certain error terms, an approximation Prob {X,; + --- + X, 2 na} = cr%* 
(1 — d/n)(1 + O(n-*)) is obtained. It is noted that Prob {X; + --- + X, = na} = m* 
Prob {Y: + --- + Yn = na}, where Y;, Y2, --- are independent variables with the p 


distribution. Some numerical illustrations of the accuracy of the approximations are 
given. 


6. Asymptotic Methods of Evaluating the Integral from a to ~ of f(z). WyMAN 
RicHARDSON, University of North Carolina. (By title) 


Three iterative procedures are considered. Procedure A: a transformation is applied 
carrying © into 0 and a into b. Then the integral, ‘‘F(a)’’, is expanded in a Taylor series 
about 0 or b. Procedure B: F:(a) = F,(a)f(a)/fila), where fi(a) = —F}(a). Procedure C: 
F(a) ” f(a)[vi(a) + --- + vp(a)] +2 (x)v,,(z)dz, where v;(a) = —f(a)/f’(a) and »,(a) = 
v:(a)v,-1(a), (Laplace, Winckler). A battery of order theorems are proved, using the Cauchy 
definition: ‘‘order of f = r’’ means “‘z~*~-* < f(x) < 2-*** for x large’. The order of the 
“relative error’, F,(z)/F(x) — 1, equals that of the ‘“‘relative frequency error’’, 
fi(z)/f(xz) — 1. If the order of f = «, B and C are usually “‘asymptotic I’’: the order 
of the relative error — « as the number of terms increases; and ‘‘asymptotie II’’: (the 
relative error for n + 1 terms)/(the relative error for n terms) — 0. If the order of f is 
finite, linear combinations of the terms are taken to make the procedure asymptotic IT. 
Useful formulae, such as F(a) > —r[f(a)]*/f’(a)(r — 1), and —[f(a)]*/f’(ajf2 — u(a)], 
where u(x) = f(x)f"(x)/[f'(z)]*, are obtained from a few terms of these procedures and 
applied to statistical distributions. They are asymptotic as a — «. Analogous procedures 
for sums of series and finite tails are considered. 


7. Some Properties of Binary Arrays Which Are Generated by Iterated Se- 
quences and Reversals. H. von Guerarp, Lockheed Aircraft Corpora- 
tion. 


From a sufficiently extended binary event another one is deduced, by assigning 0’s to 
sequences and 1’s to reversals. This operation satisfies the group of additivity of a binary 
(modulation 2) ring algebra (R. D. Bose and R. R. Kuebler, Univ. N.C., Inst. Stat., Mimeo. 
Ser. 199 (1958)). Repeated application generates an array of binary numbers, whose fre- 





ABSTRACTS 257 


quency ratios (= No. of 1’s/No. of 0’s) (a) converge to1, (b) repeat periodically, (c) assume 
relative extrema by geometric progression of iteration order, (d) behave irregularly, i.e. 
neither (a), (b) nor (c),—if, and only if, the initiating process is (a) randomized, incl. con- 
ditioned processes, (b) periodic, (c) transient (single peaks or steps), (d) inductive. To 
(b): M. Kochen and E. H. Galanter (Inform. Contr. 1, 267-288 (1958)) determined the ele- 
ments of minimal generating sets (mgs) for \-placed binary numbers (here: periods), from 
which all others could be deduced by completion or by translation. By iterated addition 
(mod. 2), any periodic process generates periodically repeated arrays (of periodic processes), 
whose elements are exactly those of the mgs’s, none of them occurring more than once. 
Since these arrays, in the average, imply more than 1 element, as \ increases, their number 
becomes progressively small if compared to the number of elements of the mgs’s. 


(Additional abstract for the Cambridge Meeting of the Institute, August 26-29, 1958) 


30. On the Bounds for the Variance of Mann-Whitney Statistic. J. 8. Rusraat, 
Michigan State University (By title) 


Let X1, X2,°-: , Xm and ¥;, ¥:,--- , Yn be two random scmples from strictly in- 
creasing continuous cumulative distribution functions (cdf’s) F(z) and G(y) respectively. 
Then the Mann-Whitney statistic U is given by U = number of pairs (X; , Y;) such that 
Y; < X;,% = 1,2,---,m;j = 1, 2,-+- , n. Let L(t) = F(G"(é)). Then variance of U, 
V(U) = mn{[(m — 1) [} (L(@® — kt)? dt + A] where A is aconstdnt free of L(é) and k = 
(n — 1)/(m — 1). Utilizing the techniques and results of an earlier paper by the author 
(Ann. Math. Stat., Vol. 28, pp. 309-328), lower and upper bounds for V(U) are determined 
in terms of p = P(Y < X) = J®.F(t) dG(t) = 1 — J} L(t) dt. The problem essentially 
is that of minimizing and maximizing {} (L(t) — kt)? dt over a class of cdf’s L(t) defined 
over [0, 1] such that {} L(t) dt = 1 — p. Lower bounds are also obtained for V(U) under 
an additional restriction that X is stochastically smaller than Y or L(t) = tfor0 st S$ 1. 
(Received July 7, 1958, revised November 24, 1958). 


nn 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of The Institute news items of interest 
Personal Items 


Frances Campbell Ameniya, formerly chairman of the Department of Mathe- 
matics at George Pepperdine College in Los Angeles, California, has been ap- 
pointed Associate Professor of Mathematics at California Western University 
in San Diego, California. 

R. E. Barlow is now working on a doctorate at Stanford while employed at 
Sylvania Electronic Defense Laboratory, Mt. View, California, as a mathe- 
matical statistician. 

Ishu Bangdiwala is on a leave of absence from his position as Head of the 
Department of the Statistics Section of the Agricultural Experiment Station of 
the University of Puerto Rico, to accept the position as Assistant Director of 
Research in the Superior Council on Education, which is the governing board 
of the University. 

Jerome Cornfield, assistant chief of the Biometrics Branch, Division of Re- 





258 NEWS AND NOTICES 


search Services, National Institute of Health, has been appointed to two pro- 
fessorships at the Johns Hopkins Medical Institutions. He is professor and chair- 
man of the Department of Biostatistics in the School of Hygiene and Public 
Health, succeeding William G. Cochran, and also fills the newly created post of 
professor of Biomathematics in the School of Medicine. 

John W. Cotton has returned to his post as assistant professor of Psychology 
at Northwestern University following a year as Postdoctoral Fellow in the De- 
partment of Statistics at the University of Chicago. 

Jean Engler is spending her second year of a postdoctoral National Science 
Foundation Fellowship at the Department of Statistics, Harvard University. 

Edgar H. Fickensher, formerly Principal of Oroville Union High School, 
Oroville, California, has moved to Stanford, California to finish work on a 
Ph.D. in Education. 

Paul Gunther is currently working as a consultant in statistics and opera- 
tions research. 

Dr. Shanti 8. Gupta spent the academic year 1957-58 as an associate pro- 
fessor in the Mathematics Department of the University of Alberta and later 
was a fellow at the Summer Research Institute of the Canadian Mathematical 
Congress at Queen’s University. He has now joined the Bell Telephone Labora- 
tories in Allentown, Pennsylvania as a member of the Technical Staff. 

Dr. John W. Hamblen, formerly Associate Professor of Mathematics and 
Director of the Computing Center at Oklahoma State University, has been 
employed as full time Director of the Computing Center which has been estab- 
lished under the general supervision of the Office of the Vice President of the 
University of Kentucky. 

Nancy Lee Hannye is now Assistant Professor at Michigan State University. 

Dr. Bernard Harris has resigned his position as Mathematician, U. 8. De- 
partment of Defense and accepted an appeintment as Assistant Professor of 
Mathematics, University of Nebraska. 

Assistant Professor L. L. Helms of Michigan State University has been ap- 
pointed to an assistant professorship at the University of Illinois. 

David Hogben has obtained a leave of absence from the Western Electric 
Co., Kearny, New Jersey, to pursue studies at Iowa State College for the Ph.D. 
degree. 

John R. Howell has recently joined the staff of the University of Dayton 
Research Institute as head of the Computer Section. 

Howard L. Jones has retired as General Supervisor of Statistics with the 
Illinois Bell Telephone Company, and has accepted an appointment as Pro- 
fessor of Statistics in the School of Business at the University of Chicago. 

Dr. Orval M. Klose, formerly Associate Professor and Head of the Mathe- 
matics Department at Seattle University, has accepted a position as Associate 
Professor of Mathematics at Humboldt State College in Arcata, California. 

John M. Leiman has resigned his position with the Personnel Laboratory, 





NEWS AND NOTICES 259 


Wright Air Division Center, and has accepted a position with System Develop- 
ment Corporation of Santa Monica, California. 

Dr. Radha G. Laha has been appointed Research Assistant Professor in the 
Department of Mathematics at the Catholic University of America for the year 
1958-59. 

Ronald Pyke has accepted a position as Assistant Professor in the Depart- 
ment of Mathematical Statistics at Columbia University for the year 1958-59. 
He formerly held a similar position at Stanford University. 

Philburn Ratoosh, on leave from the Ohio State University, is Visiting Asso- 
ciate Professor of Psychology, at the University of California, Berkeley, for 
1958-59. 

Enders A. Robinson has been appointed Assistant Professor in the Depart- 
ment of Mathematics at the University of Wisconsin, Madison, Wisconsin. 

Jagdish S. Rustagi, formerly with the Department of Statistics, Michigan 
State University, has accepted the position of Reader in Statistics, Department 
of Mathematics, Muslim University, Aligarh, India. 

Professor Henry Scheffé of the University of California will be Visiting Pro- 
fessor at Princeton University for the academic year 1958-59, where his work 
with the Statistical Techniques Research Group will be partially supported by 
a grant from the National Science Foundation. 

After spending the 1957-1958 year as a Fellow at the Center for Advanced 
Study in the Behavioral Sciences, Sidney Siegel has returned to his position as 
Associate Professor of Psychology at Pennsylvania State University. 

Jack Silber spent the summer of 1958 as Consultant to the Assistant for 
Operations Analysis, USAF and has returned to Roosevelt University as Pro- 
fessor of Mathematics and Acting Assistant to the Dean of Faculties. 

Romuald Slimak has been recently appointed Manager, Univac Computing 
Center, Remington Rand Univac Division, Sperry Rand Corporation, 315 4th 
Avenue, New York 10, New York. 

Seiji Sugihara is now Research Specialist at Lockheed Aircraft Corporation, 
Missile Systems Division, Sunnyvale, California. 

Dr. Balkrishna V. Sukhatme has been appointed Professor of Statistics in the 
Indian Council of Agricultural Research, New Delhi, India. 

Dr. John W. Wilkinson, Formerly Assistant Professor of Mathematics at 
Qeuen’s University, Kingston, Ontario, has accepted a position as Research 
Statistician at Westinghouse Research Laboratories, Pittsburgh 35, Pennsyl- 
vania. 

Dr. W. H. Williams has resigned as Assistant Professor of Statistics at Iowa 
State College to become Assistant Professor of Mathematics at McMaster 
University, Hamilton, Canada. 

Mr. D. M. G. Wishart has been appointed Lecturer in Statistics in the De- 
partment of Pure Mathematics of the University of Birmingham, England, 
from October 1, 1958. 





260 NEWS AND NOTICES 


New Members 
The following persons have been elected to membership in the Institute 
June 23, 1958, to October 16, 1958 


Adorno, David S., M.A. (Penn. State Univ.), Research Assistant, Dept. of Statistics, Har- 
vard University, Cambridge, Mass.; 823 South Road, Bedford, Mass. 

Ando, Albert K., Ph.D. (Carnegie Inst. of Tech.), Assistant Professor, Dept. of Economics, 
Massachusetts Institute of Technology, Cambridge 39, Mass. 

Bagai, Om Parkash, M.A. (Panjab, India), Student for Ph.D., Math., U.B.C., Vancouver, 
Canada; Dept. of Math., U.B.C., Vancouver, Canada. 

Bardwell, George E., M.S. (Univ. of Colorado), Assistant Professor, University of Denver; 
14465 Cleveland Place, Denver 2, Colorado. 

Barney, Jesus, C.P.A. (Monterrey Inst. of Tech.), Professor of Math. Stat. and Auditing, 
Monterrey Institute of Technology, Monterrey, N.L., Mexico; Sucursal De Correos 
“J”, Monterrey, N.L., Mexico. 

Cannon, L. Dennis, M.8. (Purdue Univ.), Graduate Research Assistant, Purdue University 
Statistical Laboratory, West Lafayette, Indiana; 137 South Salisbury St., West Lafayette, 
Indiana. 

Chacko, V. John, M.Sc. (Univ. of Trawaneore), Graduate Student, University of California 
(T.A.), University of California, Berkeley; Dept. of Statistics, University of California, 
Berkeley 6, California. 

Chipman, John S., Ph.D. (The Johns Hopkins Univ.), Associate Professor of Economics, 
University of Minnesota, Minneapolis 14, Minn.; Dept. of Economics, University of 
Minnesota, Minneaplis 14, Minn. 

Christie, Theodore J., B.S., (Rutgers Univ.), Doctoral Candidate, Dept. of Ind’l. and 
Admin. Eng’g., Cornell University, Ithaca, New York; 78 Roosevelt Street, Cress- 
kill, New Jersey. 

Cogburn, Robert F., A.B. (Univ. of California), Student, University of California, Berkeley 
4, California; 1716A Francisco, Berkeley 3, California. 

Copeland, Lewis C., Ph.D. (Duke Univ.), Acting Head of Dept. and Professor of Statistics, 
Dept. of Statistics, University of Tennessee, Knoxville, Tenn. 

Corsten, Leo C. A., Dr. (Agr. Univ., Wageningen, Netherlands), Head, Statistical Dept., 
IVRO, Wagehingen, Netherlands; 400 Smith Ave., Chapel Hill, North Carolina. 

Current, James L., MAT (Indiana Univ.), Mathematician, Nationa] Security Agency, 

Washington 25, D. C.; 4521 29th Street, Mt. Rainier, Maryland. 

De Figueiredo, Djairo Guedes, C.E. (National School of Engineering, Rio de Janeiro, 
Brazil), fellowship from National Council of Researches (Brazil), New York University; 
102-22 62nd Road, Forest Hills, 75, N. Y. 

Folks, John Leroy, Ph.D. (Iowa State College), Operations Research Engineer, Texas In- 
struments Incorporated, 13500 N. Central Expressway, Dallas, Texas; P. O. Bor 312, 
Dallas, Texas. 

Goldman, Herbert M., M.A. (Univ. of Illinois), Statistician, University of Illinois, Urbana, 
Illinois; 1111 S. Arbor Ave., Champaign, III. 

Foradori, George Thomas, M.S. (N.C. State College), Research Assistant, Dept. of Experi- 
mental Statistics, N.C. State College, Raleigh, North Carolina. 

Gabriel, Kuno Ruben, Ph.D., (Hebrew Univ.), Instructor, Hebrew University, Jerusalem, 
Israel. 

Hager, Frederick W., MS (Univ. of Delaware), Assistant Professor of Mathematics, United 
States Naval Academy, Annapolis, Maryland. 

Hawthorne, George Boltz, Jr., M.S (Georgia Inst. of Tech.), Assistant Professor of E. E. 
and part-time student, Georgia Institute of Technology; 225 North Avenue, Atlanta 13. 
Georgia. 





NEWS AND NOTICES 261 


Hefner, Oscar V., B.S. (Georgia Inst. of Tech.), Graduate Student, Georgia Tech.; Rich 
Electronic Computer Center, Ga. Tech., Atlanta 18, Ga. 

Hernandez, Antonio, Ingeniero Industrial, (Escuela Tecnica Superior de I.1.), Professor 
of Fundamental Statistics and Assistant to the Managing Director of G. E. Espanola, 
Escuela Tecnica Superior de I.I. and General Electrica Espanola, 8.A., Campo de San 
Mames, Plaza de Federico Moyua 4, Bilbao, Spain; Egana 14, Bilbao, Spain. 

Johnson, Linwood A., B.I.E., (Georgia Inst. of Tech.), Graduate Student, Georgia Insti- 
tute of Technology, North Avenue, Atlanta, Ga.; 281 N. Colonial Homes Circle, N. W.., 
Atlanta 9, Ga. 

Klotz, Jerome H., A.B., (Univ. of California), Research Assistant, University of California, 
Berkeley, California; Dept. of Statistics, Univ. of California, Berkeley 4, California. 

Koenig, Robert A., M.S. (Rutgers Univ.), Statistician, National Lead Co., Titanium Divi- 
sion, P. O. Box 58, South Amboy, New Jersey; 4 Maple Avenue, Matawan, New Jersey. 

Larson, Harold J., M.8. (Iowa State College), General Electric Fellow in Statistics, Iowa 
State College; Statistical Laboratory, Iowa State College, Ames, Iowa. 

Littell, Arthur S., Sc.D. (Johns Hopkins Univ.), Assistant Professor of Biostatistics, West- 
ern Reserve University, School of Medicine, Cleveland 6, Ohio. 

McGahey, Mary B., B.S. (Radford College), Graduate Student in Statistics, V.P.I. Blacks- 
burg, Va.; Box 11—Station A, Radford, Virginia. 

Mallios, William S., B.S. (Purdue Univ.), Research Assistant, N. C. State College; Box 
5457, Raleigh, North Carolina. 

Maloney, Richard C., M.8S., (Univ. of Southern California), Supervisor, Statistical Analysis 
Unit, Reliability Gp., Rocketdyne, 6633 Canoga Blvd., Canoga Park, California; 7112 
Delco Ave., Canoga Park, California. 

Maxwell, William L., B.M.E. (Cornell Univ.), Graduate Student, Cornell University, 
Ithaca, New York; Upson Hall, Cornell University, Ithaca, New York. 

Mendelsohn, Jay, B.A., (New York Univ.), Research Engineer, Grumman Aircraft Engi- 
neering Corporation, Bethpage, Long Island, New York; 150-40 71st Avenue, Flushing 
67, New York. 

Mikhail, Wadie Fultas, M.Sc. (Univ. of Cairo, Egypt), Graduate Student, Dept. of Statis- 
tics, Univ. of North Carolina; 2/0 Phillips Hall, Chapel Hill, North Carolina. 

Neathammer, Robert D., M.A. (Univ. of Illinois), Mathematician, U. 8. Navy Ammunition 
Depot, Quality Evaluation Laboratory, Crane, Indiana; 518 Cedar Street, Centralia, 
Illinois. 

Olson, Milton P., B.S. (Stanford Univ.), Student, Stanford University; 405 Park Street, 
Turlock, California. 

Patil, Ganapati P., M.Sc. (Univ. of Poona, India), Teaching Fellow, Dept. of Mathematics, 
University of Michigan, Ann Arbor, Michigan. 

Ray, Santosh Kumar, M.Sc. (Lucknow Univ., India), Graduate Student and Research As- 
sistant, Dept. of Mathematical Statistics, Columbia University, New York 27, New York. 

Rhodes, Benjamin T., Jr., M.A. (Univ. of Texas), Graduate Assistant, Oklahoma State 
University, Stillwater, Oklahoma; Unit 31, Apt. 1; North University Place, Stillwater, 
Oklahoma. 

Rivers, Joye Boring, MEd., (Univ. of Florida), Research Assistant and Student, Radar De- 
velopment Branch, Engineering Experiment Station, Georgia Institute of Technology, 
225 North Ave., Atlanta 13, Ga.; 4785 E. Conway Dr., N.W., Atlanta 5, Georgia. 

Rothstein, Marvin, M.S. (New York Univ.), Mathematician, Service Bureau Corp., 635 
Madison Ave., New York, N. Y.; 66 West 58rd Street, New York, N.Y. 

Roy, Jagabrata, D.Phil. (Univ. of Calcutta), Lecturer, Research and Training School, 
Indian Statistical Institute, 203 B.T. Rd., Calcutta 35, India. 

Shuford, Emir H., Jr., Ph.D. (Univ. of Illinois), Assistant Professor, Dept of Psychology, 
University of North Carolina, Chapel Hill, N. C.; Psychometric Laboratory, University 
of North Carolina, Chapel Hill, N.C. 





262 NEWS AND NOTICES 


Siniff, Donald B., M.S. (Michigan State Univ.), 2nd Lt., USAF, Student Mail Room, Box 
5840, Harlingen AFB, Harlingen, Texas; Bor 482, Grafton, Ohio. 

Sowinski, John J., 8.B. (Depaul Univ.), Research Statistician, The Toni Company, Divi- 
sion of the Gillette Company, 456 Merchandise Mart, Chicago, IIl. 

Starr, Selig, M.A., (George Washington Univ.), Head, Applied Mathematics and Statistics 
Group, Research and Development Dept., U.S. Naval Propellant Plant, Indian Head, 
Maryland; 812 University Blud., E., Silver Spring, Md. 

Stiassny, Simon, M.A., (Univ. of California), Statistician, I.B.M. Research Center, Box 
218, Yorktown Heights, New York. 

Sullivan, Rebecca J., B.A. (Michigan State Univ.), Graduate Research Assistant, Dept. of 
Statistics, Michigan State University, East Lansing, Michigan; 2209 Alpha St., Lansing, 
Mich. 

Tsao, Rhett F.S., M.A. (Oklahoma State Univ.), Graduate Assistant in Statistics, Okla- 
homa State University, Stillwater, Oklahoma; Séatistical Laboratory Dept. of Mathe- 
matics, Oklahoma State University, Stillwater, Oklahoma. 

Vithayasai, Chitra, B.S. (Chulalongkorn Univ. of Thailand), Rice Statistician, Rice Dept., 
Ministry of Agriculture, Bangkok, Thailand; Dept. of Plant Breeding, Cornell University, 
Ithaca, New York. 

Wagner, Harvey, M., M.S. (Stanford Univ.), Assistant Professor, Industrial Engineering 
and Statistics, Stanford University, Stanford, California. 

West, Del L., B.S. (Southeastern State College), Graduate Assistant, Oklahoma State Uni- 
versity, Stillwater, Oklahoma; 645 Bennett Dr., Stillwater, Okla. 

Wheatley, Thomas V., M.S. (Illinois Inst. of Tech.), Quality Control Engineer, Convair, 
A Division of General Dynamics, Pomona, California; P. O. Box 884, Pomona, Cali- 
fornia. 

Woo, Jae Lin, M.S. (Massachusetts Inst. of Tech.), Instructor, Textile Eng’g., College of 
Engineering, Seoul National University, Seoul, Korea; 403 Beacon Street, Box 15, Massa- 
chusetts. 

Woo, Juo Chuan, M.S. (N. C. State College), Assistant Director of Processing Research, 
The Textile Research Center, School of Textiles, N.C. State College, Raleigh, North Caro- 
lina. 

Yamane, Taro, Ph.D., (Univ. of Wisconsin), Assistant Professor, New York University, 
School of Commerce, Dept. of Economics, Washington Square, New York 3, New York. 

Yamamoto, Sumiyasu, M.S. (Hiroshima Univ.), Professor of Statistics, Dept. of Statistics, 
Nara Medical College Kashihara, Nara, Japan. 

Zadeh, Lofti A., Ph.D. (Columbia Univ.), Professor of Electrical Engineering, Columbia 
University, New York 27, New York; 850 James Street, Pelham Manor, New York. 
Zilmer, Delbert E., Ph.D. (Univ. of Wisconsin), Mathematician, U.S. Naval Ordnance 

Test Station, China Lake, California; 701-B Lexington Ave., China Lake, California 


(Nn a 


SUMMER OFFERINGS IN STATISTICS AT IOWA STATE 
COLLEGE 


The Department of Statistics at Iowa State College will offer six applied 
courses in statistical theory and methods in its two 1959 summer sessions. 
These courses are planned primarily for graduate students or research workers 
with limited mathematical backgrounds who wish to use statistical techniques 
intelligently for application to other fields. In addition, a course on special topics 
in theoretical or applied statistics may be studied at the graduate level. Senior 
staff members will be available during most of the summer for consultations on 
research or special problems. 





NEWS AND NOTICES 263 


Students may register for either or both of the six-week summer sessions: 
June 8-July 15 and July 15-August 21. The complete list of statistics offerings 
for the first session is as follows: Stat. 401, “Statistical Methods for Research 
Workers” (at the level of Snedecor’s Statistical Methods): Stat. 447, “Statistical 
Theory for Research Workers” (mainly theory of experimental statistics at the 
level of Anderson and Bancroft’s Statistical Theory in Research); Stat. 599, 
“Special Topics’; and Stat. 699, “Research.” In the second session will be 
offered Stat. 402, a continuation of 401; Stat. 448, a continuation of 447; two 
courses in applied methods which are more specialized, Stat. 411, “Experimental 
Designs for Research Workers,” and Stat. 421, “Survey Designs for Research 
Workers”; and finally Stat. 599 and 699. Additional information may be ob- 
tained from T. A. Bancroft, Department Head and Director, Statistical Labora- 
tory, Iowa State College. 


(en 


INTERDISCIPLINARY CONFERENCE ON SELF-ORGANIZING 
SYSTEMS 


An Interdisciplinary Conference on Self-Organizing Systems will be held on 
May 5th and 6th, 1959, at the Museum of Science and Industry, Chicago, 
Illinois. The conference is to be co-sponsored by the Information Systems 
Branch of the Office of Naval Research and the Armour Research Foundation. 
The purpose of this conference is to bring together research workers in all fields 
of science who are concerned either with the development of self-adaptive in- 
formation systems or with the conduct of research which may contribute to an 


improved understanding of cognitive, learning, and growth processes. Particular 
emphasis will be placed on theoretical models of systems which are capable of 
spontaneous classification, identification, and symbolization of their inputs. 

Interested individuals may receive further information and a preliminary 
conference program when available by writing to Mr. Scott Cameron, ICSOS 
Conference Secretary, Armour Research Foundation, 10 West 35th Street, 
Chicago 16, Illinois. 

—— 


A NEW JOURNAL OF STATISTICS 


The Statistical Society of New South Wales, founded in 1947, is the only 
Society in Australia concerned with all aspects of statistics and related fields. 
Among its activities are general and section meetings and symposia. The latter 
have been attended by people from all over Australia. It has also published a 
Bulletin; the present editor is H. 8. Konijn. 

The Society has now been enabled to issue a printed Journal, to be called 
The Australian Journal of Statistics, and to be issued three times a year. The 
Editorial Board consists of H. 8S. Konijn, H. O. Lancaster and R. 8. G. Ruther- 
ford (address: The University of Sydney, Sydney, Australia). The Journal ex- 
pects to have contributions from Australia and abroad in the field of statistical 
theory and applications. Prospective authors are requested to send for a list of 





264 NEWS AND NOTICES 


instructions regarding the form of the manuscripts. Issues will be available to 
non-members at 10 shillings a copy. 


a ee I 


NATIONAL SCIENCE FOUNDATION BULLETINS 


The National Science Foundation announces a new series of bulletins which, 
when completed, will represent an inventory listing of all significant scientific 
information sources or activities within the Federal Government. The primary 
objective of this program is to make unclassified, unpublished scientific research 
information easily accessible and readily available to all U. 8. scientists and 
engineers, both in and out of the Government. 

An earlier NSF report, “Organization of the Federal Government for Scien- 
tific Activities”, describes the pattern according to which the various Govern- 
ment agencies are organized for the conduct of basic research, applied research, 
development, and other scientific activities including scientific information. The 
documents in this series will present, for these same Government branches, in- 
formation on general subject fields in which scientific reports are prepared, 
categories of scientific reports issued, policies regarding the announcement and 
availability of these reports to the scientific community, and locations and 
policies of the agencies’ libraries and information centers, etc. 

Bulletin No. 1, October 1958, gives information on the Department of Agri- 
culture. Further information may be obtained from the URI Program Director, 
National Science Foundation, Science Information Service, Washington 25, 
D.C. 


a en 


VISITING FOREIGN MATHEMATICIANS 


The following list (dated October 27, 1958) of visiting foreign mathematicians 
has been received from the Division of Mathematics, National Academy of 
Sciences, National Research Council. The information given is, in order, the 
name, home country, host institution, and period of visit; AY stands for aca- 
demic year. The names of persons whose visit terminates before May, 1959, 
have not been included. 


ALVAREZ DE Araya, JorGE, Chile, University of Washington, Sept. 1958-Sept. 1959; 
Austin, M., U. K., University of Wisconsin, Sept. 1958-Sept. 1959; Baayen, Pieter C.., 
Netherlands, University of California, Berkeley, AY 1957-59; Besicovitcn, A.8., England, 
University of Pennsylvania, Sept. 1958-June 1959; BraLynicx1, ANpRzEJ, Poland, Uni- 
versity of California, Berkeley, AY 1958-59; BLakers, ALBERT, Australia, Princeton 
University, Sept. 1958-August 1959; Borinaer, Victor, Australia, North Carolina State 
College, June 1957-June 1959; CuapmMan, SypNneEy, England, University of Michigan, 1958-59; 
Cuartres, Bruce A., Australia, Brown University and Massachusetts Institute of Tech- 
nology, Sept. 1958-May 1959; Cuwe, Byounc-Sonc, Korea, University of California, 
Berkeley, AY 1957-59; Craaas, James W., U. K., Brown University, Sept. 1958-Sept. 1959; 
Drazin, Puitip G., England, Massachusetts Institute of Technology, Sept. 16, 1958-June 
15, 1959; Duauip, A. M., Australia, Brown University, Aug. 15. 1958-May 31, 1959; Fior, 





NEWS AND NOTICES 265 


Perer, Australia, Duke University, July 1958-June 1959; Focurr, Suaut R., Israel, Uni- 
versity of California, Berkeley, Sept. 1958-June 1959; Funrxen, Gesuarp, Germany, Uni- 
versity of California, Berkeley, AY 1957-59; Gani, JoserH M., Australia, Columbia Uni- 
versity, Feb. 1959-Feb. 1960; Grewz, Rupo.r, Spain, University of California, Berkeley, 
AY 1957-59; Grippen, Ronap J., England, Massachusetts Institute of Technology, Sept. 
16, 1958-June 15, 1959; Ha, Kwane Cuut, Korea, University of North Carolina, Sept. 15, 
1958-June 30, 1959; Hano, Jun-1cu1, Japan, University of Chicago, Sept. 1, 1958-Aug. 31, 
1959; HetmBerc, GiLBErT M., Austria, Institute of Int. Education and Tulane University, 
July 1958-Sept. 1959; Hztms, Hans, Denmark, Tufts University, Sept. 1958-June 1959; 
Hitton, Peter, England, Cornell University, Sept. 1958-Sept. 1959; Hutu, T. E., Canada, 
California Institute of Technology, Sept. 1, 1958-Aug. 31, 1959; Inmay, Suraaaa, Israel, 
Institute of Mathematical Sciences, New York University, Sept. 1958-August 1959; 
KLINGEN, Hetmut, Germany, University of California, Sept. 1958-June 1959; Kristensen, 
Lerr, Denmark, Yale University, Aug. 1958-Aug. 1959; Krzywicx1, A., Poland, University 
of Kansas, Oct. 1958-July 1959; Lana, Rapua G., India, Catholic University, Sept. 1957- 
May 1959; Leis, Rotr, Germany, Institute of Mathematical Sciences, New York University, 
Sept. 1958-Aug. 1959; Levy, Azrigex, Israel, Massachusetts Institute of Fechnology, Sept. 
16, 1958-June 15, 1959; Marusita, Kameo, Japan, Princeton University, AY 1958-59; 
MrircHeii, A. R., Scotland, California Institute of Technology, Mar. 1, 1959-Sept. 30, 
1959; Moruanp, L. W., England, Brown University, Sept. 1, 1958-May 31, 1959; Mosrowsx1, 
AnprzBJ, Poland, University of California, Berkeley, Sept. 1958-June 1959; Mux1, R., 
Japan, Brown University, Sept. 15, 1958-June 30, 1959; Munn, W. D., Scotland, Tulane 
University, Sept. 1958-Sept. 1960; Nacao, Hrrosr, Japan, University of Michigan, 1958- 
1959; Nacata, Masayosni, Japan, Harvard University, Sept. 1, 1958-June 30, 1959; 
NaNniNI, Amos, Italy, University of Minnesota, Sept. 1958-June 1959; Opata, Mono, 
Japan, University of Illinois, Sept. 1958-Sept. 1959; Opzn, Faroux J. 8., Jordan, Uni- 
versity of California, Berkeley, AY 1956-59; Ocawa, Junsiro, Japan, University of North 
Carolina, August 1958-June 30, 1959; Onrsuxa, M., Japan, University of Kansas, March 
1959-May 1959; Orxawa, Koraro, Japan, University of California, Los Angeles, July 1, 
1958-June 30, 1959; O’Kererrs, JEREMIAH, Ireland, Institute of Mathematical Sciences, 
New York University, Sept. 1958-August 1959; Oxuso, Tansiro, Japan, State College of 
Washington, Sept. 16, 1958-June 10, 1959; PeperseN, FLEMMING Per, Denmark, University 
of Southern California, July 1958-July 1959; Pie1se., A., Sweden, University of Kansas, 
March 1959-May 1959; Piis, ANpRzEJ, Poland, Institute of Mathematical Sciences, New 
York University, Sept. 1958-Aug. 1959; Proupman, Ian, U. K., California Institute of Tech- 
nology, Sept. 1958-Sept. 1959; Ronri, Hetmut, Germany, University of Chicago, Oct. 1, 
1958-June 30, 1959; Rusen, Harowp, England, Columbia University, AY 1957-59; Ruro- 
vitz, Denis, So. Africa, University of California, Berkeley, Sept. 1958-June 1959; Sas- 
HARWAL, Rangit S1nau, India, University of California, Berkeley, AY 1958-59; Sasipuss1, 
Gert O., Austria, Tulane University, Sept. 1955-Sept. 1959; Saxurar, Axrra, Japan, 
Institute of Mathematical Sciences, New York University, Sept. 1958-Aug. 1959; Scuazrer, 
Hetmut, W. Germany, State College of Washington, Mar. 19, 1958-July 1959; Scumipr, 
PauLe Frosic, Denmark, University of Minnesota, Sept. 1958-Sept. 1959; Scnorr, ANn- 
DREAS, Switzerland, American University and National Bureau of Standards, Oct. 1957- 
June 1959; Scrispa, J. Crisropn, Germany, University of Massachusetts, Aug. 1958-Sept. 
1959; Suisna, Oven, Israel, Technion-Israel Institute, Sept. 1, 1958-June 30, 1959; Sroson, 
Freperico M., Philippines, University of California, Berkeley, AY 1956-59; Skoveaarp, 
H., Denmark, California Institute of Technology, Jan. 1, 1959-June 30, 1960; Szarsx1, J., 
Poland, University of Kansas, Oct. 1958-July 1959; Verma, G. R., India, Institute of 
Mathematical Sciences, New York University, Sept. 1958-August 1959; VoceL, WALTER, 
Germany, University of Chicago, Oct. 1958-June 1959; Watuacez, Davin, U. K., Princeton 
University, Sept. 1958-Sept. 1959; Waiter, Wo_raaneG, Germany, University of Mary- 
land, Sept. 1, 1958-June 30, 1959; Warson, Georrrey, Australia, Princeton University, 





266 NEWS AND NOTICES 


Sept. 1958-June 1959; Wone, Yune-Cuow, Hong Kong, Institute for Advanced Study, 
Sept.—Dec. 1958, University of Chicago, Feb. 1-June 30, 1959; Sept. 1958-June 30, 1959; 
Woops, A. C., England, Tulane University, Sept. 1957-Sept. 1959; Yasunara, Mitsuru, 
Japan, University of California, Berkeley, AY 1957-59; Yevpsevicu, V. M., Yugoslavia, 
American University and National Bureau of Standards, Feb. 1958-June 1959; Youna, 
Evutiquio, Philippines, University of Maryland, Sept. 1958-Sept. 1959; ZapUNAISKY, 
Pepro, Argentina, Princeton University, 1957-1959; Zassennavus, H. J., Canada, Cali- 
fornia Institute of Technology, Sept. 1, 1958-Aug. 31, 1959. 


I 


SOUTHERN REGIONAL GRADUATE SUMMER SESSION IN 
STATISTICS AT NORTH CAROLINA COLLEGE, 1959 


The 1959 session of the Southern Regional Graduate Summer Session in 
Statistics will be held at North Carolina State College, Raleigh, from June 8 to 
July 17, 1959. 

North Carolina State College, Virginia Polytechnic Institute, University of 
Florida, and Oklahoma State University have agreed to operate a continuing 
program of graduate summer sessions in statistics to be held at each institution 
in rotation; the program was instituted at Virginia Polytechnic Institute in the 
summer of 1954. 

The 1959 session, like previous sessions under this program, is intended to 
serve: 1) teachers of introductory statistical courses who want formal training in 
modern statistics; 2) research and professional workers who want intensive 
instruction in basic statistical concepts and modern statistical methodology; 
3) professional statisticians who wish to keep informed about advanced special- 
ized theory and methods; 4) prospective candidates for graduate degrees in 
statistics; and 5) graduate students in other fields who desire supporting work in 
statistics. 

The session wii! last six weeks and courses will carry three semester hours of 
credit. Not more than two courses may be taken for credit at any one session. 
The summer work in statistics may be applied as residence credit at any of the 
cooperating institutions, as well as certain other universities, in partial fulfillment 
of the requirements for a graduate degree. The program may be entered at any 
session, and consecutive courses will follow in successive summers so that it 
would be possible for a student to complete the course work in statistics for a 
Master’s degree in three summers. Students must satisfy the remaining require- 
ments for course work and thesis at the institution where they are to be admitted 
to candidacy. The advanced courses may be accepted as part of the Ph.D. pro- 
gram of the participating institutions. 

The National Science Foundation is offering grants to college teachers of intro- 
ductory statistics who wish to attend the 1959 session. Stipends of $75 per week 
for the six weeks of the session plus $15 per week per dependent (up to four) 
will be made available for a maximum of 30 applicants; in addition, there will be 
a travel allowance of 4 cents per mile, round trip. Tuition and fees will be paid 





NEWS AND NOTICES 267 


by the National Science Foundation. Participants will normally be enrolled in 
classes for graduate credit. 

The courses to be offered in statistics in 1959 at Raleigh are as follows: Statisti- 
cal Methods I and II, Statistical Theory I, II, and III, Theory of Sampling 
Applied to Survey Design, Stochastic Processes and Their Applications, Meth- 
ods of Operations Research, Advanced Topics in Statistical Methods. 

A number of courses will, in addition, be available in the Mathematics. Depart- 
ment. Courses to be offered during the June 8-July 17 period of the Summer 
Session in Statistics include: Differential Equations, Introduction to Deter- 
minants and Matrices, Introduction to Applied Mathematics, Numerical 
Analysis, Advanced Calculus I, Boundary Value Problems. 

(For students interested in the regular 12-week summer session at North 
Carolina State College, additional Mathematics courses are available: Advanced 
Calculus IT and Advanced Differential Equations will be offered in the second 
6-week session beginning July 20; Complex Variable Theory and Vector Spaces 
and Matrices will be offered over the 12 weeks of both sessions.) The formal 
course offerings will be supplemented by seminars and special lectures. 

Teachers under National Science Foundation grants will probably be inter- 
ested primarily in the Statistical Methods and Statistical Theory sequences 
and in Sampling. 

Applicants for National Science Foundation grants will be selected on the 
basis of interest in continued teaching of statistics, evidence of excellence as a 
teacher, previous academic record of the applicant, number of introductory 
statistics courses now teaching, and number of students contacted. Applications 
must be received not later than February 16, 1959 to be assured of full considera- 
tion. Applicants will be notified of the selection committee’s action not later 
than March 16, 1959, and must accept (or decline) a fellowship award not later 
than April 1, 1959. 

Requests for application blanks for the summer school and for National 
Science Foundation grants should be addressed to: F. E. McVay, Department 
of Experimental Statistics, North Carolina State College, Raleigh, North 
Carolina. 


— or 


REPORT OF THE MONTEREY, CALIFORNIA, MEETING OF 
THE INSTITUTE OF MATHEMATICAL STATISTICS 


The seventy-ninth meeting of The Institute of Mathematical Statistics, a 
Western Region Meeting, was held in Spanagel Hall on the campus of the 
United States Naval Postgraduate School at Monterey, California, on Novem- 
ber 14-15, 1958. 

The Chairman of the Program Committee was Richard Link. David Stoller 
acted as chairman in his absence. 





268 NEWS AND NOTICES 


Sixty-one people registered for the meeting. Forty-six members of the Insti- 
tute attended. 


The program for the meeting was as follows: 


FRIDAY, NOVEMBER 14, 1958 


Welcoming Remarks, Rear Admiral E. E. Yeomans, Superintendent, U. 8. Naval Post- 
graduate School. 
10:00-12:00 a.m. Special Topics (I) 
Chairman: Cuarues B. Betu, San Diego State College. 
1. Multivariate Tchebycheff Inequalities, Incram O.Lx1Nn, Michigan State University. 
2. Evaluation of Extreme Tail Probabilities, Davin BLACKWELL AND JosEepH HopceEs, 
University of California. 
1:30-2:30 p.m. Multiple Comparisons and Decisions 
Chairman: Frep C. ANpREws, University of Oregon. 
1. Multiple Decision-Ranking Procedures, Ropert E, Becunorer, Cornell University. 
2. Estimation of the Means of Dependent Variables, Ot1vE JEAN Dunn, Iowa State Col- 
lege (now at University of California, Los Angeles). 
2:45-4:00 p.m. Special Topics (II) 
Chairman: Otive JEAN Dunn, Univ. of California, Los Angeles. 
1. Statistical Problems in Radio Wave Propagation, Witut1am C. Horrman, RAND 
Corporation. 
2. Stochastic Models for the Electron Multiplier Tube, Howarv G. Tucker, University 
of California, Riverside. 
3. On a Multicompartment Migration Model with Chronic Feeding—Preliminary Report, 
D. Wiaa1ns, Hanford Laboratory. 


SATURDAY, NOVEMBER 165, 1958 


10:00-12:00 a.m. Experimental Design and Analysis 
Chairman: Davin 8. Sto.tter, The RAND Corporation. 
1. Sequential Design of Experiments, Herman Cuernorr, Stanford Univ. 
2. Tests Associated with Poisson Processes, BENJAMIN Epstein, Wayne State University. 
1:30-3:30 p.m. Contributed Papers 
Chairman: Herman Rustin, University of Oregon 
1. On Computing Expectations in Sequential Analysis, Frep C. ANDREws, University 
of Oregon, and J. R. Blum, Indiana University. 
2. Exact Nonparametric Tests for Randomized Blocks, Joun E. Wavsu, Systems Develop- 
ment Corporation. (By title) 
3. On the Determination of Joint Distributions from the Marginal Distributions of Linear 
Combinations, Toomas 8S. Fercuson, University of California, Los Angeles. 
4. Approximation to the Probability Density of Zero—Crossing Intervals of a Gaussian 
Process, SytvAIN EnRENFELD, New York University. (By title) 
5. The Probability in the Extreme Tail of a Convolution, Davin BLACKWELL AND J. L. 
Hopaes, Jr., Univ. of California. (By title) 
6. Asymptotic Methods of Evaluating the Integral froma to ~ of f(z), Wyman RicHarp- 
son, Univ. of North Carolina. (By title) 
7. Some Properties of Binary Arrays which are Generated by Iterated Sequences and Re- 
versals, H. von Guerarp, Lockheed Aircraft Corporation. 


GERALD J. LIEBERMAN 
Associate Secretary 





NEWS AND NOTICES 269 


FINAL REPORT OF THE EDITORS OF THE ANNALS FOR 1958 


The Annals is indebted to the following people who have generously given 
refereeing assistance: G. Albert, T.W. Anderson, E. Barankin, G. A. Barnard, 
G. Baxter, V. E. Benes, A. Birnbaum, D. Blackwell, J. Blum, R. Blumenthal, 
C. Blyth, L. Breiman, R. Bradt, K. A. Bush, V. J. Chacko, D. Champernowne, 
H. Chernoff, A. Clarke, A. C. Cohen, W. 8. Connor, J. Cornfield, D. R. Cox, 
H. David, K. Dawson, R. Dawson, L. E. Dubins, A. Duncan, Olive J. Dunn, 
J. Durbin, P. 8. Dwyer, G. Elfving, B. Epstein, T. Ferguson, G. H. Freeman, 
8. G. Ghurye, I. J. Good, B. Greenberg, U. Glenander, F. Graybill, F. Grubbs, 
E. J. Gumbel, 8. 8. Gupta, J. Hannan, Nancy Lee Hannye, M. Hansen, J. L. 
Hodges, Jr., W. Hoeffding, H. Hotelling, 8. Hunter, G.S. James, A. Joffee, G. 
M. Jenkins, V. Johns, M. Juncosa, L. Katz, E. 8. Keeping, D. G. Kendall, 
8. Kullback, R. Laha, L. LeCam, R. A. Leibler, E. Lukacs, A. Madansky, W. 
Madow, C. Mallows, T. K. Matthes, J. McGregor, B. McMillan, D. Mesner, 
A. Mood, R. B. Murphy, P. E. Ney, I. Olkin, E. Parzen, M. P. Peisakoff, K. 
Pillai, J. W. Pratt, W. E. Pruitt, R. Pyke, C. R. Rao, E. Reich, J. Riordan, 
D. Robson, Joan Rosenblatt, M. Rosenblatt,S. N. Roy, H. Rubin, J. Sacks, I. 
R. Savage, L. J. Savage, Rosedith Sitgreaves, J. L. Snell, M. Sobel, C. M. Stein, 
Charlotte Striebel, J. C. Tanner, R. F. Tate, H. Teicher, D. Teichroew, A. J. 
Thomasian, D. R. Truax, S. Vajda, H. R. Van der Vaart, D. Votaw, D. L. 
Wallace, L. Wegner, B. L. Welch, O. Wesler, R. F. White, R. Wijsman, M. 
B. Wilk, E. Williams, D. G. Wishart, A. Wortham, Ying Yao, M. Zelen. 

T. E. Harris, Editor 


December 3, 1958 (before July 1, 1958) 
William Kruskal, Editor 
(after June 30, 1958) 


en 


PUBLICATIONS RECEIVED 


Moore, Geoffrey H., Measuring Recessions, Occasional Paper 61, National Bureau of Eco- 
nomic Research, Inc., New York, 1958 (Reprinted from the June 1958 issue of the Jour- 
nal of the American Statistical Association), $1.00. 








BIOMETRIKA 


Volume 45, Parts 3 and 4 Contents December 1958 


Memoirs: 
Bares, Tuomas. towards solving a problem in the doctrine of 4 iano, [Reproduced ts from Phil. Trans. 
Roy. Soc. 1763, 53, 370-418]. Studies in the history of probability and statistics. LX. a biographical 
note by G. A A. Barnard. L Lesuis, P. H. anv Gowen, J. C. The properties of a stochastic medal t or two compet- 
ing species. TANNER, J.C. A problem in the combination of accident frequencies. Darnrocu, J. N. The multi- 
Peepetsotese conse. I. —— —— he ea a popeiaticn. PS Bower, M' G. Confidence inter ea 
the analysis of variance. nh alternative estimators for an 
den equation. Parrerson, H. D. The: 2 Te autoregression in fitti tod ma exponential curve ave, Hatewe, Fra FRanx 
A. Two queues in parallel. SHENTON, L. R. Moment estimators aw likelihood . SRIVASTAVA, A. 
B. L. Effect of non-normality on the power function of t-test. Pzarson, E. 8. Note on Mr. Srivastava’s Paper 
on the power function of Student’s test. Barton, D. E. anp Cas.ey, D.J.A quick estimate of the regression 
coefficient. Zivcer, A. anp S8t-Prerre, J. On the choice of the best amongst three normal populations with 
known variances. DRonKERs, J. J. Approximate formulae for the statistical distributions dame values. 
Hooper, J. W. The sampling. variance of correlation coefficients under assumptions of fixed and mixed vari- 
ates. Jounson, N. L. The mean deviation, with special reference to samples from a Pearson T: Type III popula- 
tion. Mrnnrxoton, MAXINE AND PEARSON, E. 8. An approximation to the distribution non-central ¢. 
Foster, F. Upper percentage points of ‘the generalized beta distribution. III. Menpennauy, WILLIAM 
anv Haper, G. Estimation of parameters of mixed exponentially distributed failure time distributions 
from censored life test data. Menpenuatyt, Wriitam. A bibliography on life nt. Co 
Miscellanea: Contributions by D. E. Barton & F. N. Davin, D. R. Cox, Epwin L. Cro 
& M. G. Kenvaur, Inwiw Gorrmuan, D. G. Kane, Ror LerPntK, Rrra Mavrice, T. A. ‘kee 
8. N. Roy & R. GNANDESIKAN, M. Sankaran, B. V. SuKHatTME 


Corrigenda: N. L. Jounson, A. R. Kamat, J. Saw 


Reviews Other Books received 
The subscription, payable in advance, is now 54s (or $8.00), per volume (including postage). Cheques should 
be made payable to Biometrika. crossed “a/c Biometrika Trust’ and sent to the Secretary, Biometrika Office 
Department of Statistics, University College, London, W.C.1. All foreign cheques must be drawn on a bank 
having a London agency. 

Issued by THE BIOMETRIKA OFFICE, University College, London 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 26, No. 4 = October 1958 


Ho.urs B. CHenery AND TSUNEHIKO WaTaNABE. International Comparison of the Structure of Production 
Kewnnertu J. Anrow anp Leonip Hurwicz .....-On the Stability of the Competitive Equilibrium, I 
L. R. Kuew The Estimation of Distributed Lags 
Irvine Hocn Simultaneous Equation Bias in the Context of the Cobb- Douglas Production Function 
Jan TINBERGEN..... oobah en wena de Bernard Chait A l’économétrie 
Harnotp T. Davis......... sie Charles Frederick Roos 
Maurice Frecuer ce ssessseeeeses.. Metter to the Editor 
REPorT OF THE PHILADELPHIA MEETING 


Boox Reviews: 


The Accumulation ow Capital (Joan Robinson). Review by 5 L. R. Klein International Economic Papers, No. 6 
(International Economic Association). Review by Abba P Lerner Insurance and Economic Theory (Irving 
Pfeffer). Review by Paul Boschan Essai sur la théorie générale de la monnaie: économique rationelle (A. A 
petit). Review by Léon Dupriez Tables of the Non-Central t-Distribution (Denaiy” Munction Cumulative 
a Function and Percentage Points) (George J. Reskinoff and Gerald J. Lieberman) Review by N. 

L. Johnson The United a Capital Position and the Structure of Its Foreign Trade (M. A. Diab). Review 
by R. E. Caves Numerical is (Zdenek Kopal). Review by H. E. Goheen Economic Policy: Prin- 
ciples and Design (J. Tinberesn) Revi iew by Arthur Smithies Second ~~ in Linear Programming, 
Volumes I and II (H. A. Antosiewicz, ed.). Review by Robert Dorfman tures et Cycles Economigues, 
Tome II (Johan Akerman). Review by Henri Guitton Théories des fonctions aléatoires (A. Blanc-Lapierre 
and R. Fortet). Review by L. Schmetterer Selected Papers in Statistics and Probability by Abraham Wald. 
Review by Alan Stuart Analisis dela Demanda: Un Estudio de Econometria (Herman Wold and Lars Jureen). 
Review by A Alceaide Inchausti Geographic Differentials of Agricultural Wages in the United States (Willis 
D. Weatherford). Review by H. Gollnick Price, Cost, and Output (P. J. D. Wiles). Review by J. A. Nordin 
Mathe matische Statistek (van der Waerden). Review by I. Pfanzagl Fondements d'une théorie positive des choix 
com: eaet un risque et critique des postulats et ariomes de l’école americaine (M. Allais). Review by Eber- 
hard Fels 


ANNOUNCEMENTS AND NoTss 











“JOURNAL OF 
ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 


30s per part Vol. 20 No. 2 1958 ana. Sub, 
nel. post 
£3 2s Od 
















CONTENTS 


Tus Reoression Anauysis or Brnany Sequences. By D. R. Cox. (With Discussion.) 
Renewa, Tusory anv 17s Ramirications. By Water L. Surru. (With Discussion.) 


On Asymproricatty Errictent Consistent Estimates or Tue Spectra Densrry Funcrion or a Sra- 
TIONARY Tiwe Szernizs. By Emanver Parzen. 


Tue Eermation or THe Spectra, Density aFren TREND Removay. By E. J. Hannan. 
On tHe Smoornine or Prosasitiry Densiry Functions. By P. Wurrrze. 
ExPerIMents wits Mixtures. By Henny Scuzrrsé. 

Tas In~reraction ALcortTrum anp Practica, Fourrer Anatysis. By I. J. Goon. 
EqQua.tiy Cornre.atep VARIATES AND THE MuttInoRMAL INTEGRAL. By ALAN Srvuart. 


Formvu.az ror CALCULATING Tan OpprATING CHARACTERISTIC AND THE AVERAGE SamPLe NUMBER OF 
Some SequentTiat Tests. By K. W. Kemp 


On Corrections to tue Cuar-Squarep Disrrisution. By J. H. Darwin. 

SamMPiine wirHovur ReP.acemMent with Propasitity Proportionay To Size. By W. L. Srevens. 
Moutrvaniats Quantat Anatysis. By P. J. CLarnineBowp. 

A Conciss Derivation or GengeraL OntuoconaL Poitrnomiats. By C. P. Cox. 

A Suep.tiiep Mopet ror Detays iv OverTAKING on A Two-Lanz Roap. By J. C. Tanner. 














The Royal Statistical Society, 21, Bentinck Street, London, W. 1 





Vol. XIX - N. 1-2 


Contents 


Cornravo Gnu Logic in Statistics 
Corravo Gin1 Gerolamo Cardano ed i fondamenti del calcolo delle probabilita 
A. R. Kamat Contributions to the theory of statistics based on the firet and second successive differences... . 


C. Gua, C. Vrrurnso, C. Benzperm:, A. Herzet Problemi di transvariazione inversa 


Canto Benupert: I! coefficients di correlasione del Bravais come funzione non moltiplicativa........... 


Amato Henze. Influenza del raggruppamento in classi sulla probabilita ¢ sull’intensita di transvariazione 
Z. W. Branpaum On an inequality due to S. Gatti... 2.2... 6c cece ccc cece ee eeeeeeeneeeesesneeserece j 
M. pve Novetus Some applications and developments of Gatti-Birnbaum inequality 

Manuscripts submitted for publication should be addressed to Prof. CORRADO GINI, 


Universita di Roma (Italy), Via delle Terme di Diocleziano, 10. The Editors will not be re- 
sponsible for the safe return of the original. 











end 


